> The odd thing here is that RFC 2047 (MIME) seems to be about encoding > non-ASCII character sets in ASCII. So the spec is kind of odd here. > The actual bytes on the wire seem to be ASCII, but they may an > interpretation where those ASCII bytes represent a non-ASCII string.
HTTP is fairly confused about usage of non-ASCII characters in headers. For example, RFC 2617 specifies that, for Basic authentication, userid and password are *TEXT (excluding : in the userid); it then says that user-pass is base64-encoded. It nowhere says what the charset of userid or password should be. People now interpret that as saying: it's TEXT, so you need to encode it according to RFC 2047 before using it in a header, requiring that the userid first gets MIME-Q-encoded (say, or B), and then the result gets base64-encoded again, then transmitted. Neither web browsers nor web servers implement that correctly today. But in short, the intention seems to be that the HTTP headers are strict ASCII on the wire, with non-ASCII encoded using MIME header encoding. A library implementing that in Python should certainly use bytes at the network (stream) side, and strings at the application side. Even though the format is human-readable, the protocol is byte-oriented, not character-oriented. Regards, Martin _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
