Antoine Pitrou wrote: >>There are many design alternatives: > > Wouldn't it be simpler to use: > - one-byte representation if every character <= 0xFF > - two-byte representation if every character <= 0xFFFF > - four-byte representation otherwise
As I said: there are many alternatives. This one has the disadvantage of requiring a copy every time you pass the string to a Win32 function (which expects UTF-16). Whether or not this is a significant disadvantage, I don't know. In any case, a multi-representations implementation has the disadvantage of making the C API more difficult to use, in particular for writing codecs. On encoding, it is difficult to fetch the individual characters which you need for the lookup table; on decoding, it is difficult to know in advance what representation to use (unless you know there is an upper bound on the decoded character ordinals). Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com