[Cory Benfield]
> Folks, just a reminder: RFC 2616 is dead. RFC 7230 says that *newly
defined* header
> fields should limit their field values to US-ASCII, but older header
fields are a
> crapshoot (though it notes that “in practice, most” header field values
use US-ASCII).
> Regardless, it seems to me that the correct method of communicating field
values would have been byte strings.

I think it's worth pointing out that the original intention of specifying
iso-8859-1 encoding was the request components *would* be presented to the
application as bytes.

WSGI was designed to work on python 2, where bytes and strings were stored
in the same datatype. In cpythons UCS-2 encoding, where every character
takes two bytes, only the lower byte would contain a value if the character
was from the iso-8859-1 character set. Moreover, encoding and decoding such
"byte strings" from iso-8859-1 would not change any values, i.e. iso-8859-1
was chosen because encoding and decoding from it was an identity transform.

The same considerations applied to Jython 2.x (which uses UTF-16) and
Ironpython 2.x (also UTF-16 I think), but which both had to the same
bytes/strings duality problem.

If python 2.x had had a bytes type, then that's what would have been used.

This would also have made more explicit that it is the applications job to
decode the bytes into whatever encoding it thinks is appropriate (i.e.
essentially what it has guessed, in the real world). The WSGI servers job
is to give the original bytes from the request to the WSGI application

The concluding message in the original discussion of encodings is here, if
anyone is interested.


Web-SIG mailing list
Web SIG: http://www.python.org/sigs/web-sig

Reply via email to