>I would find > > message[b'Subject'] = b'Hello' > >to be totally gross. > >While RFC Email is all ASCII, except if 8bit transfer is legal, there >are internal encoding provided that permit the expression of Unicode in >nearly any component of the email, except for header identifiers. But >there are never Unicode characters in the transfer, as they always get >encoded (there can be UTF-8 byte sequences, of course, if 8bit transfer >is legal; if it is not, then even UTF-8 byte sequences must be further >encoded). > >Depending on the level of email interface, there should be no interface >that cannot be expressed in terms of Unicode, plus an encoding to use >for the associated data. Even 8-bit binary can be translated into a >sequence of Unicode codepoints with the same numeric value, for example.
One significant problem is that the email module is intended to be able to work with malformed e-mail without mangling it too badly. The malformed e-mail should also make a round-trip through the email module without being further mangled. I think this requires the underlying processing to be all based on bytes, but doesn't preclude layers on top that parse the charset hints. The rules about encoding are strict, but not always followed. For instance, the headers *must* be ASCII (the header body can, however, be encoded - see rfc2047). Spammers often ignore this, and you might be inclined to say "stuff em'", but this would make the SpamBayes authors rather unhappy. One solution is to provide two sets of classes - the underlying bytes-based one, and another unicode-based one, built on top of the bytes classes, that implements the same API, but that may fail due to encoding errors. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com