Re: [Python-Dev] UCS2/UCS4 default

Jeroen Ruigrok van der Werven Thu, 03 Jul 2008 10:35:53 -0700

-On [20080703 19:21], Adam Olsen ([EMAIL PROTECTED]) wrote:
>On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> Please remember that lone surrogate pair code points are perfectly
>> valid Unicode code points, nevertheless. Just as a lone combining
>> code point is valid on its own.
>
>That is a big part of these problems.  For all practical purposes, a
>surrogate is like a UTF-8 code unit, and must be handled the same way,
>so why the heck do they confuse everybody by saying "oh, it's a code
>point too!"?


Because surrogate code points are not Unicode scalar values, isolated UTF-16
code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode
5.0/5.1, section 3.9)

So, no, it is not a code point too.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Als men blijft geloven kan de zwaarste steen niet zinken...
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UCS2/UCS4 default

Reply via email to