Stephen J. Turnbull wrote: > I gave you one, MIME processing in email
If implementing a mime packer is really the only use case for base64, then it might as well be removed from the standard library, since 99.99999% of all programmers will never touch it. Those that do will need to have boned up on the subject of encoding until it's coming out their ears, so they'll know what they're doing in any case. And they'll be quite competent to write their own base64 encoder that works however they want it to. I don't have any real-life use cases for base64 that a non-mime-implementer might come across, so all I can do is imagine what shape such a use case might have. When I do that, I come up with what I've already described. The programmer wants to send arbitrary data over a channel that only accepts text. He doesn't know, and doesn't want to have to know, how the channel encodes that text -- it might be ASCII or EBCDIC or morse code, it shouldn't matter. If his Python base64 encoder produces a Python character string, and his Python channel interface accepts a Python character string, he doesn't have to know. > I think it's your turn. Give me a use case where it matters > practically that the output of the base64 codec be Python unicode > characters rather than 8-bit ASCII characters. I'd be perfectly happy with ascii characters, but in Py3k, the most natural place to keep ascii characters will be in character strings, not byte arrays. > Everything you have written so far is based on > defending your maintained assumption that because Python implements > text processing via the unicode type, everything that is described as > a "character" must be coerced to that type. I'm not just blindly assuming that because the RFC happens to use the word "character". I'm also looking at how it uses that word in an effort to understand what it means. It *doesn't* specify what bit patterns are to be used to represent the characters. It *does* mention two "character sets", namely ASCII and EBCDIC, with the implication that the characters it is talking about could be taken as being members of either of those sets. Since the Unicode character set is a superset of the ASCII character set, it doesn't seem unreasonable that they could also be thought of as Unicode characters. > I don't really see a downside, except for the occasional double > conversion ASCII -> unicode -> UTF-16, as is allowed (but not > mandated) in XML's use of base64. What downside do you see? It appears that all your upsides I see as downsides, and vice versa. We appear to be mutually upside-down. :-) XML is another example. Inside a Python program, the most natural way to represent an XML is as a character string. Your way, embedding base64 in it would require converting the bytes produced by the base64 encoder into a character string in some way, taking into account the assumed ascii encoding of said bytes. My way, you just use the result directly, with no coding involved at all. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiam! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--------------------------------------+ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com