Glenn Linderman writes: > There is no reference to the word emacs or types in any of the messages > you've posted in this thread, maybe you are referring to another thread > somewhere? Sorry, I'm new to this party, but I have read the whole > thread... unless my mail reader has missed part of it.
I'm sorry, you are right; the relevant message was never sent. Here it is; I've looked it over briefly and it seems intelligible, but from your point of view it may seem out of context now. Glenn Linderman writes: > This is where you use the Latin-1 conversion. Don't throw an error > when in doesn't conform, but don't go to heroic efforts to provide > bytes alternatives... just convert the bytes to Unicode, and the > way the mail RFCs are written, and the types of encodings used, it > is mostly readable. And if it isn't encoded, it is even more > readable. This is what XEmacs/Mule does. It's a PITA for everybody (except the Mule implementers, whose life is dramatically simplified by punting this way). For one thing, what's readable to a human being may be death to a subprogram that expects valid MIME. GNU Emacs is even worse; it does provide both a bytes-like type and a unicode-like type, but then it turns around and provides a way to "cast" unicodes to bytes and vice-versa, thus exposing implementation in an unclean (and often buggy) way. > And so how much is it a problem? What are the effects of the problem? In Emacs, the problem is that strings that are punted get concatenated with strings that are properly decoded, and when reencoding is attempted, you get garbage or a coding error. Since Mule discarded the type (punt vs. decode) information, the app loses. There's no way to recover. The apps most at risk are things like MUAs (which Emacs does well) and web browsers (which it doesn't), and even AUCTeX (a mode for handling LaTeX documents---TeX is not Unicode-aware so its error messages are frequently truncated in the middle of a UTF-8 character) and they go to great lengths to keep track of what is valid and what is not in the app. They don't always succeed. I think Emacs should be doing this for them, somehow (and I'm an XEmacs implementer, not an MUA implementer!) The situation in Python will be strongly analogous, I believe. > I'm not suggesting making it worse than what it already is, in > bytes form; just to translate the bytes to Unicode codepoints so > that they can be returned on a Unicode interface. Which *does* make it worse, unless you enforce a type difference so that punted strings can't be mixed with decoded strings without effort. That type difference may as well be bytes vs. Unicode as some subclass of Unicode vs. Unicode. "Why would you mix strings?" Well, for one example there are multiple address headers which get collected into an addressee list for purpose of constructing a reply. If one of the headers is broken and another is not, you get mixed mode. The same thing can happen for multilingual message bodies: they get split into a multipart with different charsets for different parts, and if one is broken but another is not, you get mixed mode. > So they'll use the Unicode API for text, and the bytes APIs for binary > attachments, because that is what is natural. Well, as I see it there won't be bytes APIs for text. The APIs will return Unicode text if they succeed, and raise an error if not. If the error is caught, the offending object will be available as bytes. > If improperly encoded messages are received, and appropriate > transliterations are made so that the bytes get converted (default code > page) or passed through (Latin-1 transformation), then the data may be > somewhat garbled for characters in the non-ASCII subset. But that is > not different than the handling done by any 8-bit email client, nor, I > suspect (a little uncertainty here) different than the handling done by > Python < 3.0 mail libraries. Which is exactly how we got to this point. Experience with GNU Mailman and other such applications indicate that the implementation in the existing Python email module needs work, and Barry Warsaw and others who have tried to work on it say that it's not that easy, and that the API may need to change to accomodate needed changes in the implementation. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com