On 9/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > Should encoding be an attribute of the string?
> No. A Python string is a sequence of Unicode characters. > Even if it was created by converting from some other encoding, > that original encoding gets lost when doing the conversion > (just like integers don't remember which base they were originally > represented in). Theoretically, it is a sequence of code points. Today, in python 2.x, these are always represented by a specific (wide, fixed-width) concrete encoding, chosen at compile time. This is required so long as outside code can access the data buffer directly. It would no longer be required if all access were through unicode methods. (And it would probably make sense to have a "get-me-the-buffer-in-this-encoding" method.) Several people seem to want more efficient representations when possible. Several people seem to want UTF-8, which makes sense if the rest of the system is UTF8, but complicates the implementation. Simply not encoding/decoding until required would save quite a bit of time and space -- but then the object would need some way of indicating which encoding it is in. -jJ _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
