>>>>> "Ron" == Ron Adam <[EMAIL PROTECTED]> writes:
Ron> We could call it transform or translate if needed. You're still losing the directionality, which is my primary objection to "recode". The absence of directionality is precisely why "recode" is used in that sense for i18n work. There really isn't a good reason that I can see to use anything other than the pair "encode" and "decode". In monolingual environments, once _all_ human-readable text (specifically including Python programs and console I/O) is automatically mapped to a Python (unicode) string, most programmers will never need to think about it as long as Python (the project) very very strongly encourages that all Python programs be written in UTF-8 if there's any chance the program will be reused in a locale other than the one where it was written. (Alternatively you can depend on PEP 263 coding cookies.) Then the user (or the Python interpreter) just changes console and file I/O codecs to the encoding in use in that locale, and everything just works. So the remaining uses of "encode" and "decode" are for advanced users and specialists: people using stuff like base64 or gzip, and those who need to use unicode codecs explicitly. I could be wrong about the possibility to get rid of explicit unicode codec use in monolingual environments, but I hope that we can at least try to achieve that. >> Unlikely. Errors like "A >> string".encode("base64").encode("base64") are all too easy to >> commit in practice. Ron> Yes,... and wouldn't the above just result in a copy so it Ron> wouldn't be an out right error. No, you either get the following: A string. -> QSBzdHJpbmcu -> UVNCemRISnBibWN1 or you might get an error if base64 is defined as bytes->unicode. Ron> * Given that the string type gains a __codec__ attribute Ron> to handle automatic decoding when needed. (is there a reason Ron> not to?) Ron> str(object[,codec][,error]) -> string coded with codec Ron> unicode(object[,error]) -> unicode Ron> bytes(object) -> bytes str == unicode in Py3k, so this is a non-starter. What do you want to say? Ron> * a recode() method is used for transformations that Ron> *do_not* change the current codec. I'm not sure what you mean by the "current codec". If it's attached to an "encoded object", it should be the codec needed to decode the object. And it should be allowed to be a "codec stack". So suppose you start with a unicode object "obj". Then >>> bytes = bytes (obj, 'utf-8') # implicit .encode() >>> print bytes.codec ['utf-8'] >>> wire = bytes.encode ('base64') # with apologies to Greg E. >>> print wire.codec ['base64', 'utf-8'] >>> obj2 = wire.decode ('gzip') CodecMatchException >>> obj2 = wire.decode (wire.codec) >>> print obj == obj2 True >>> print obj2.codec [] or maybe None for the last. I think this would be very nice as a basis for improving the email module (for one), but I don't really think it belongs in Python core. Ron> That may be why it wasn't done this way to start. (?) I suspect the real reason is that Marc-Andre had the generalized codec in mind from Day 0, and your proposal only works with duck-typing if codecs always have a well-defined signature with two different types for the argument and return of the "constructor". -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com