On Wed, Jul 2, 2008 at 11:22 AM, Jeroen Ruigrok van der Werven
<[EMAIL PROTECTED]> wrote:
> -On [20080702 19:42], Guido van Rossum ([EMAIL PROTECTED]) wrote:
>>Yes. At least in the sense that \Uxxxxxxxx gets translated to a
>>surrogate pair, and that the UTF-8 codec supports surrogate pairs in
>>both directions. It's been like this for a long time. What else would
>>you expect from UTF-16 support?
>
> Well, unless I misunderstand things, a Python 3 compiled with the default
> Unicode option gives this:
>
>>>> len("\N{MUSICAL SYMBOL G CLEF}")
> 2
>
> Whereas a Python 3 with --with-wide-unicode gives:
>
>
>>>> len("\N{MUSICAL SYMBOL G CLEF}")
> 1
>
> This, of course, causes problems with splitting, finding, and so on.

Understood.

> So that
> means that a Python 3 with only 2 byte Unicode support is not to be
> used/recommended for Unicode outside of the BMP.

I disagree. Instead, I would say that such code needs to be aware of surrogates.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to