Paul Prescod wrote:
> On 9/25/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> 
>> As David Hopwood pointed out, to be fully correct, you already have to
>> create a custom function even with bmp characters, because of
>> decomposed characters.  (Example:  Representing a c-cedilla as a c and
>> a combining cedilla, rather than as a single code point.)  Separating
>> those two would be wrong.  Counting them as two characters for slicing
>> purposes would usually be wrong.
> 
> Even 32-bit representations are permitted to use surrogate pairs; it
> just doesn't often make sense.
> 
> There is at least one big difference between surrogate pairs and decomposed
> characters. The user can typically normalize away decompositions.

That depends what script they're using. For some scripts, they can't.

-- 
David Hopwood <[EMAIL PROTECTED]>


_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to