At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:
On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
> I don't understand why you are attached to this horrible hack
> (bytes-in-unicode). It introduces more work and more confusing than
> using raw bytes unchanged.
>
> It doesn't work and so something has to be changed.
It's gross but it does work. This has been discussed ad-nausium on
web-sig over a period of years.
I'd like to reiterate that it is only even a potential issue for the
PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been
urldecoded already, into byte-data in some encoding. For all the
other keys (including the ones from os.environ), they are either
*properly* decoded in 8859-1 or are just ascii (possibly still
urlencoded, so the app needs to urldecode and decode into a string
with the correct encoding).
Right. Also, it should be mentioned that none of this would be
necessary if we could've gotten a "bytes of a known encoding"
type. If you look back to the last big Python-Dev discussion on
bytes/unicode and stdlib API breakage, this was the holdup for
getting a sane WSGI spec.
Since we couldn't change the language to fix the problem (due to the
moratorium), we had to use this less-pleasant way of dealing with
things, in order to get a final WSGI spec for Python 3.
(If anybody is wondering about the specifics of the language change
that was needed, it'd be having a "bytes with known encoding" type,
that when combined in any polymorphic operation with a unicode
string, would result in bytes-with-encoding output, and would raise
an error if the resulting value could not be encoded in the target
encoding. Then we would simply do all WSGI header operations with
this type, using latin-1 as the target encoding.)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com