Stephen J. Turnbull wrote: > the kind of "text" for which Unicode was designed is normally produced > and consumed by people, who wll pt up w/ ll knds f nnsns. Base64 > decoders will not put up with the same kinds of nonsense that people > will.
The Python compiler won't put up with that sort of nonsense either. Would you consider that makes Python source code binary data rather than text, and that it's inappropriate to represent it using a unicode string? > You're basically assuming that the person who implements the code that > processes a Unicode string is the same person who implemented the code > that converts a binary object into base64 and inserts it into a > string. No, I'm assuming the user of base64 knows the characteristics of the channel he's using. You can only use base64 if you know the channel promises not to munge the particular characters that base64 uses. If you don't know that, you shouldn't be trying to send base64 through that channel. > In most environments, it should be possible to hide bytes<->unicode > codecs almost all the time, But it *is* hidden in the situation I'm talking about, because all the Unicode encoding/decoding takes place inside the implementation of the text channel, which I'm taking as a given. > I don't think it's a good idea to gratuitously introduce > wire protocols as unicode codecs, I am *not* saying that base64 is a unicode codec! If that's what you thought I was saying, it's no wonder we're confusing each other. It's just a transformation from bytes to text. I'm only calling it unicode because all text will be unicode in Py3k. In py2.x it could just as well be a str -- but a str interpreted as text, not binary. > What do you think the email module does? > Assuming conforming MIME messages But I'm not assuming mime in the first place. If I have a mail interface that will accept chunks of binary data and encode them as a mime message for me, then I don't need to use base64 in the first place. The only time I need to use something like base64 is when I have something that will only accept text. In Py3k, "accepts text" is going to mean "takes a character string as input", where "character string" is a distinct type from "binary data". So having base64 produce anything other than a character string would be awkward and inconvenient. I phrased that paragraph carefully to avoid using the word "unicode" anywhere. Does that make it clearer what I'm getting at? -- Greg _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com