Antoine Pitrou wrote:
>>There are many design alternatives:
> 
> Wouldn't it be simpler to use:
> - one-byte representation if every character <= 0xFF
> - two-byte representation if every character <= 0xFFFF
> - four-byte representation otherwise

As I said: there are many alternatives. This one has the
disadvantage of requiring a copy every time you pass the string
to a Win32 function (which expects UTF-16).

Whether or not this is a significant disadvantage, I don't know.

In any case, a multi-representations implementation has the
disadvantage of making the C API more difficult to use, in
particular for writing codecs. On encoding, it is difficult
to fetch the individual characters which you need for the
lookup table; on decoding, it is difficult to know in advance
what representation to use (unless you know there is an upper
bound on the decoded character ordinals).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to