Graham wrote:
> Armin has fast asleep now, so my shift.
Heh. It's a multiple-man job keeping up with this monster thread!
The URLs don't break.
Not in themselves. Just the language of the PEP implies that to fix them
up would contravene the spec:
>> The application MUST use [the encoding guess for PATH_INFO] to decode
>> the ``'QUERY_STRING'`` as well.
This isn't appropriate even as a SHOULD: the guessed encoding for
PATH_INFO is very likely to be wrong, in particular for cases where the
path was purely ASCII.
The application (or a library/framework acting on its behalf) should be
allowed to decode QUERY_STRING using whatever encoding it is expecting.
Disallowing using anything other than utf-8 (and iso-8859-1 in a very
unreliable way) makes it impossible to have queries in any other
encoding at all and still comply with the spec, which is undesirable.
If this sentence is removed, and `wsgi.uri_encoding` is guaranteed to be
one of:
a. definitive and reliable, or
b. missing/None
I'm pretty much happy. What I don't want is that half the future-WSGI
servers/gateways decide they have to provide *some* value for
`wsgi.uri_encoding` even if they're not quite sure if it's the right
one. Then we're back to square one.
if it is known that an application or some subset of
URLs will always be receiving a request as non UTF-8, then it should
employ code in those cases to always transcode it to the required
encoding.
Yep, agreed. I think the PEP should clarify that; at the moment it is
saying that a transcode is something you should only do for the
iso-8859-1 case, but if you actually followed that advice you'd get
highly inconsistent results. Perhaps we're at cross-purposes as to what
exactly consistutes 'middleware'...
The other fallback is that a specific WSGI server could elect to
provide an option to not use 'UTF-8' as the first choice for decoding
I really, *really* hope this does not happen. That just brings us more
deployment heartaches.
Whether surrogateescape gives a better solution I have no idea at this
point
Yeah... I'm highly suspicious of surrogateescape in a web context and
personally my code will be deliberately filtering all such characters
out. I can see it being a possible way to smuggle unwanted sequences
(such as overlongs) through filters, potentially causing endless
security problems. But we'll see...
--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe:
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com