>>>>> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote: >> I gave you one, MIME processing in email Greg> If implementing a mime packer is really the only use case Greg> for base64, then it might as well be removed from the Greg> standard library, since 99.99999% of all programmers will Greg> never touch it. I don't have any real-life use cases for Greg> base64 that a non-mime-implementer might come across, so all Greg> I can do is imagine what shape such a use case might have. I guess we don't have much to talk about, then. >> Give me a use case where it matters practically that the output >> of the base64 codec be Python unicode characters rather than >> 8-bit ASCII characters. Greg> I'd be perfectly happy with ascii characters, but in Py3k, Greg> the most natural place to keep ascii characters will be in Greg> character strings, not byte arrays. Natural != practical. Anyway, I disagree, and I've lived with the problems that come with an environment that mixes objects with various underlying semantics into a single "text stream" for a decade and a half. That doesn't make me authoritative, but as we agree to disagree, I hope you'll keep in mind that someone with real-world experience that is somewhat relevant[1] to the issue doesn't find that natural at all. Greg> Since the Unicode character set is a superset of the ASCII Greg> character set, it doesn't seem unreasonable that they could Greg> also be thought of as Unicode characters. I agree. However, as soon as I go past that intuition to thinking about what that implies for _operations_ on the base64 string, it begins to seem unreasonable, unnatural, and downright dangerous. The base64 string is a representation of an object that doesn't have text semantics. Nor do base64 strings have text semantics: they can't even be concatenated as text (the pad character '=' is typically a syntax error in a profile of base64, except as terminal padding). So if you wish to concatenate the underlying objects, the base64 strings must be decoded, concatenated, and re-encoded in the general case. IMO it's not worth preserving the very superficial coincidence of "character representation" in the face of such semantics. I think that fact that favoring the coincidence of representation leads you to also deprecate the very natural use of the codec API to implement and understand base64 is indicative of a deep problem with the idea of implementing base64 as bytes->unicode. Footnotes: [1] That "somewhat" is intended literally; my specialty is working with codecs for humans in Emacs, but I've also worked with more abstract codecs such as base64 in contexts like email, in both LISP and Python. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com