Re: [Python-Dev] UCS2/UCS4 default

Adam Olsen Thu, 03 Jul 2008 10:21:47 -0700

On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-07-03 15:21, Jeroen Ruigrok van der Werven wrote:
>>
>> -On [20080703 15:00], M.-A. Lemburg ([EMAIL PROTECTED]) wrote:
>>>
>>> Unicode if full of combining code points - if you break such a sequence,
>>> the output will be just as wrong; regardless of UCS2 vs. UCS4.
>>
>> In my opinion you are confusing two related, but very separated things
>> here.
>> Combining characters have nothing to do with breaking up the encoding of a
>> single codepoint. Sure enough, if you arbitrary slice up codepoints that
>> consist of combining characters then your result is indeed odd looking.
>>
>> I never said that nor is that the point I am making.
>
> Please remember that lone surrogate pair code points are perfectly
> valid Unicode code points, nevertheless. Just as a lone combining
> code point is valid on its own.


That is a big part of these problems.  For all practical purposes, a
surrogate is like a UTF-8 code unit, and must be handled the same way,
so why the heck do they confuse everybody by saying "oh, it's a code
point too!"?


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UCS2/UCS4 default

Reply via email to