Jim Jewett writes: > Maybe I'm missing something, but it seems to me that there are only a > few logical combinations;
There are lots of logical combinations, but most of them fall into "general transform", is that what you mean? > if the below is wrong, maybe that is one > reason unicode seems more complex than it should. > > Encoding: str -> ByteString > (staticmethod) BytesString.encode(my_string, encoding=?) > == > my_string.encode(encoding=?) > > Decoding: ByteString -> str > my_bytes.decode(encoding=?) > == > (staticmethod) str.decode(my_bytes, encoding=?) +1 > General Transforming: > # Why insist on type-preservation? > # Why even make these methods? > my_string.transform(fn) == fn(my_string) > my_bytes.transform(fn) == fn(my_bytes) Make them methods if they are "like" codecs, by which I mean something like (more or less) invertible stream-oriented transformations. Eg, my_bytes.gzip() Pretty weak, though. > Transcoding: ByteString -> ByteString > # If you care how it is represented, it is no longer unicode; > # it is a specific (ByteString) representation > mybytes.recode(old_encoding=?, new_encoding) > > # Can the old encoding often be inferred? > # Or should it always be written because of EIBTI? (1) I agree this is the obvious connotation of "transcode" in the codec context. (2) This usage is too special to deserve treatment at this level, especially since for most purposes my_bytes.decode(old_encoding).encode(new_encoding) will be perfectly sufficient. (3) old_encoding should not be inferred as part of .decode() or .recode(), as such inference is unreliable and domain-specific heuristics often lead to great improvements. A separate method/function should be used. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com