At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:
On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
> I don't understand why you are attached to this horrible hack
> (bytes-in-unicode). It introduces more work and more confusing than
> using raw bytes unchanged.
>
> It doesn't work and so something has to be changed.

It's gross but it does work. This has been discussed ad-nausium on web-sig over a period of years.

I'd like to reiterate that it is only even a potential issue for the PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been urldecoded already, into byte-data in some encoding. For all the other keys (including the ones from os.environ), they are either *properly* decoded in 8859-1 or are just ascii (possibly still urlencoded, so the app needs to urldecode and decode into a string with the correct encoding).

Right. Also, it should be mentioned that none of this would be necessary if we could've gotten a "bytes of a known encoding" type. If you look back to the last big Python-Dev discussion on bytes/unicode and stdlib API breakage, this was the holdup for getting a sane WSGI spec.

Since we couldn't change the language to fix the problem (due to the moratorium), we had to use this less-pleasant way of dealing with things, in order to get a final WSGI spec for Python 3.

(If anybody is wondering about the specifics of the language change that was needed, it'd be having a "bytes with known encoding" type, that when combined in any polymorphic operation with a unicode string, would result in bytes-with-encoding output, and would raise an error if the resulting value could not be encoded in the target encoding. Then we would simply do all WSGI header operations with this type, using latin-1 as the target encoding.)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to