At 10:37 AM 5/8/2009 -0700, Robert Brewer wrote:
It also explicitly states that "HTTP does not directly support Unicode,
and neither does this interface. All encoding/decoding must be handled
by the application; all strings passed to or from the server must be
standard Python BYTE STRINGS (emphasis mine), not Unicode objects. The
result of using a Unicode object where a string object is required, is
undefined."

It also says what the interpretation is when 'str' is a unicode string type.

PEP 333 is difficult to interpret because it uses the name "str"
synonymously with the concept "byte string", which Python 3000 defies. I
believe the intent was to differentiate unicode from bytes, not elevate
whatever type happens to be called "str" on your Python du jour. It was
and is a mistake to standardize on type names ("str") across platforms
and not on type behavior ("byte string").

Ironically, 'str' is what's consistent in type behavior; the bytes type doesn't supply the same operations.


If Python3 WSGI apps emit unicode strings (py3k type 'str'), you're
effectively saying the server will always call
"chunk.encode('latin-1')". That negates any benefit of using unicode as
the type for the response. That's not "supporting unicode"; that's using
unicode exactly as if it were an opaque byte string. That's seems silly
to me when there is a perfectly useful byte string type.

Compatibility sometimes demands we do silly things. Personally, I think it's kind of silly that Python 3 files return incompatible data types depending on what mode you open them in, but there's not a whole lot we can do about that.

Meanwhile, existing WSGI code ported to Python 3 is going to yield strings until/unless manually converted; AFAIK 2to3 has no way to automatically detect WSGI-ness and convert your strings to bytes.


I don't see any benefit to that.

There isn't any benefit to doing it by *hand*. However, backward compatibility demands that servers *accept* such strings, as they may be generated by legacy apps.

That's why the Python 3 WSGI amendments say servers MUST accept this, even thought applications SHOULD supply bytes.

That is, for new code, we do want bytes. What we don't want, ever, is unicode characters above #255 in any unicode strings sent as part of the response body.

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to