Re: [Python-Dev] thoughts on the bytes/string discussion

Ronald Oussoren Tue, 06 Jul 2010 07:54:02 -0700

On 27 Jun, 2010, at 11:48, Greg Ewing wrote:

> Stefan Behnel wrote:
>> Greg Ewing, 26.06.2010 09:58:
>>> Would there be any sanity in having an option to compile
>>> Python with UTF-8 as the internal string representation?
>> It would break Py_UNICODE, because the internal size of a unicode character 
>> would no longer be fixed.
> 
> It's not fixed anyway with the 2-char build -- some
> characters are represented using a pair of surrogates.


It is for practical purposes not even fixed in 4-char builds. In 4-char builds 
every Unicode code points corresponds to one item in a python unicode string, 
but a base characters with combining characters is still a sequence of 
characters and should IMHO almost always be treated as a single object. As an 
example, given s="be\N{COMBINING DIAERESIS}" s[:2] or s[2:] is almost certainly 
semanticly invalid.

Ronald

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] thoughts on the bytes/string discussion

Reply via email to