On Wed, Jul 14, 2010 at 12:19 AM, Graham Dumpleton <
graham.dumple...@gmail.com> wrote:

>  >> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
> >> exclusively with encoded versions (that represent the original request
> >> URI).  We use Latin1 encoding, but it should be ASCII anyway, like most
> of
> >> the headers.
> BTW, it should be highlighted whether this change is relevant to
> Python 3 but like some of the other things you relegated as out of
> scope, purely a wish list item.

Certainly; most headers or metadata is pretty much constrained to ASCII, and
any use of non-ASCII is... at least peculiar, and presumably
application-specific.  For instance, there's no reason you'd have anything
but ASCII in Cache-Control.  The one place encoded information happens
regularly in headers (that I know of) is Cookie.  The request URI path is
generally ASCII, but SCRIPT_NAME and PATH_INFO *aren't* the request URI
path, they are URL decoded versions of the request URI path.  And they are
usually encoded in UTF8... but UTF8 is a lossy encoding, so decoding them is
problematic (though we could define that they must be decoded with
surrogateescape).  And while they are usually UTF8, they are sometimes no
valid encoding at all, because anyone can assemble any set of characters
they want and web browsers will accept it.

By avoiding URL-unquoting of these values, we can also stick to Latin1 and
get something reasonable.  It's not very attractive to me that we take
something that is probably *not* Latin1, and may reasonably not be ASCII,
and decode it as Latin1.

Ian Bicking  |  http://blog.ianbicking.org
Web-SIG mailing list
Web SIG: http://www.python.org/sigs/web-sig

Reply via email to