[Cory Benfield] > Folks, just a reminder: RFC 2616 is dead. RFC 7230 says that *newly defined* header > fields should limit their field values to US-ASCII, but older header fields are a > crapshoot (though it notes that “in practice, most” header field values use US-ASCII). > > Regardless, it seems to me that the correct method of communicating field values would have been byte strings.
I think it's worth pointing out that the original intention of specifying iso-8859-1 encoding was the request components *would* be presented to the application as bytes. WSGI was designed to work on python 2, where bytes and strings were stored in the same datatype. In cpythons UCS-2 encoding, where every character takes two bytes, only the lower byte would contain a value if the character was from the iso-8859-1 character set. Moreover, encoding and decoding such "byte strings" from iso-8859-1 would not change any values, i.e. iso-8859-1 was chosen because encoding and decoding from it was an identity transform. The same considerations applied to Jython 2.x (which uses UTF-16) and Ironpython 2.x (also UTF-16 I think), but which both had to the same bytes/strings duality problem. If python 2.x had had a bytes type, then that's what would have been used. This would also have made more explicit that it is the applications job to decode the bytes into whatever encoding it thinks is appropriate (i.e. essentially what it has guessed, in the real world). The WSGI servers job is to give the original bytes from the request to the WSGI application *unchanged*. The concluding message in the original discussion of encodings is here, if anyone is interested. https://mail.python.org/pipermail/web-sig/2004-September/000860.html Alan.
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com