On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden <st...@holdenweb.com> wrote: > A.M. Kuchling wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > > they be treated as 7-bit text? I'll have to think about this. > > Neither! You can't open them as 7-bit text, because real-world email > does contain bytes whose ordinal value exceeds 127. You can't open them > using a text encoding because theoretically there might be ASCII headers > that indicate that parts of the content are in specific character sets > or encodings. > > If only we had a data structure that easily allowed us to manipulate > 8-bit characters ...
email6 *will* handle this use case. When it exists :) But note that it is *not* just a matter of easily handling 8 bit characters. There are a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. All the info is there in the email headers, but being able to do string operations on 8 bit byte strings doesn't get you the answers you need by itself. It really is the case that the Python3 bytes/unicode split forces us to redo most of the algorithms so that they handle bytes and text *correctly*. This isn't a trivial undertaking, but the end result will be well worth it. -- R. David Murray www.bitdance.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com