P.J. Eby wrote: > At 08:07 AM 5/8/2009 -0700, Robert Brewer wrote: >> I decided that that single type should be byte strings because I want >> WSGI middleware and applications to be able to choose what encoding >> their output is. Passing unicode to the server would require some >> out-of-band method of telling the server which encoding to use per >> response, which seemed unacceptable. > > I find the above baffling, since PEP 333 explicitly states that > when using unicode types, they're not actually supposed to *be* > unicode -- they're just bytes decoded with latin-1.
It also explicitly states that "HTTP does not directly support Unicode, and neither does this interface. All encoding/decoding must be handled by the application; all strings passed to or from the server must be standard Python BYTE STRINGS (emphasis mine), not Unicode objects. The result of using a Unicode object where a string object is required, is undefined." PEP 333 is difficult to interpret because it uses the name "str" synonymously with the concept "byte string", which Python 3000 defies. I believe the intent was to differentiate unicode from bytes, not elevate whatever type happens to be called "str" on your Python du jour. It was and is a mistake to standardize on type names ("str") across platforms and not on type behavior ("byte string"). If Python3 WSGI apps emit unicode strings (py3k type 'str'), you're effectively saying the server will always call "chunk.encode('latin-1')". That negates any benefit of using unicode as the type for the response. That's not "supporting unicode"; that's using unicode exactly as if it were an opaque byte string. That's seems silly to me when there is a perfectly useful byte string type. > So, the server doesn't need to know "what encoding to use" -- it's > latin-1, plain and simple. (And it's an error for an application to > produce a unicode string that can't be encoded as latin-1.) > > To be even more specific: an application that produces strings can > "choose what encoding to use" by encoding in it, then decoding those > bytes via latin-1. (This is more or less what Jython and IronPython > users are doing already, I believe.) That may make sense for Jython and IronPython if they truly do not have a usable byte string type. But it doesn't make as much sense for Python3 which has a usable byte string type. My way: App Server --- ------ bchunk = uchunk.encode('utf-8') yield bchunk write(bchunk) Your way: App Server --- ------ bchunk = uchunk.encode('utf-8') uchunk = chunk.decode('latin-1') yield uchunk bchunk = uchunk.encode('latin-1') write(bchunk) I don't see any benefit to that. Robert Brewer fuman...@aminus.org
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com