Graham wrote:

> Armin has fast asleep now, so my shift.

Heh. It's a multiple-man job keeping up with this monster thread!

The URLs don't break.

Not in themselves. Just the language of the PEP implies that to fix them up would contravene the spec:

>> The application MUST use [the encoding guess for PATH_INFO] to decode
>> the ``'QUERY_STRING'`` as well.

This isn't appropriate even as a SHOULD: the guessed encoding for PATH_INFO is very likely to be wrong, in particular for cases where the path was purely ASCII.

The application (or a library/framework acting on its behalf) should be allowed to decode QUERY_STRING using whatever encoding it is expecting. Disallowing using anything other than utf-8 (and iso-8859-1 in a very unreliable way) makes it impossible to have queries in any other encoding at all and still comply with the spec, which is undesirable.

If this sentence is removed, and `wsgi.uri_encoding` is guaranteed to be one of:

  a. definitive and reliable, or
  b. missing/None

I'm pretty much happy. What I don't want is that half the future-WSGI servers/gateways decide they have to provide *some* value for `wsgi.uri_encoding` even if they're not quite sure if it's the right one. Then we're back to square one.

if it is known that an application or some subset of
URLs will always be receiving a request as non UTF-8, then it should
employ code in those cases to always transcode it to the required
encoding.

Yep, agreed. I think the PEP should clarify that; at the moment it is saying that a transcode is something you should only do for the iso-8859-1 case, but if you actually followed that advice you'd get highly inconsistent results. Perhaps we're at cross-purposes as to what exactly consistutes 'middleware'...

The other fallback is that a specific WSGI server could elect to
provide an option to not use 'UTF-8' as the first choice for decoding

I really, *really* hope this does not happen. That just brings us more deployment heartaches.

Whether surrogateescape gives a better solution I have no idea at this
point

Yeah... I'm highly suspicious of surrogateescape in a web context and personally my code will be deliberately filtering all such characters out. I can see it being a possible way to smuggle unwanted sequences (such as overlongs) through filters, potentially causing endless security problems. But we'll see...

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to