> A middleware might re-decode the values if the `wsgi.uri_encoding` is
> `iso-8859-1` and only then.

Seems like a mistake. If the middleware knows iso-8859-7 is in use, it would need to transcode the charset regardless of whether the initially-submitted bytes were a valid UTF-8 sequence or not. Otherwise the application would break when fed with eg. Greek words that happened to encode to valid UTF-8 bytes.

> The application MUST use this value to decode the ``'QUERY_STRING'``
> as well.

This will break all use of non-UTF-8 encodings in QUERY_STRING, where the path part of the URL does not contain non-UTF-8 sequences. That includes the very common case where the path part contains only ASCII.

    http://greek.example.com/myscript.cgi?x=%C2

will fail, as the given UTF-8 sniffer only looks at the path part to determine what encoding to use for both of the path part and the query string. I don't think WSGI should mandate any particular decoding of the QUERY_STRING.

To be honest, I'm still uncomfortable with any use of Unicode strings in WSGI. But if we're going to do it, I'd go for consistency. Treating the decoding of the URL specially is a nasty hack that is only there because the CGI spec stupidly requires %-decoding to be done on PATH_INFO and SCRIPT_NAME.

So why not go with (the long-ago suggested) optional variables like 'wsgi.real_path_info' that, if present, are the original strings before %-decoding? Now it doesn't greatly matter what string types and encodings we pick, because everything will be ASCII anyway. It also solves the %2F problem.

If those variables are not present (typically for CGI environments that cannot provide them), the application/framework *may* try recover non-ASCII characters from PATH_INFO/QUERY_STRING, with undefined results. This is the broken-but-sometimes-rescuable status quo for CGI: by the time Python reads non-ASCII characters out of the environment they may already have been mangled by up to two conversion processes.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to