Paul Prescod schrieb:
>  There is at least one big difference between surrogate pairs and
> decomposed characters. The user can typically normalize away
> decompositions. How do you normalize away decompositions in a language
> that only supports 16-bit representations?

I don't see the problem: You use UTF-16; all normal forms (NFC, NFD,
NFKC, NFKD) can be represented in UTF-16 just fine.

It is somewhat tricky to implement a normalization algorithm in
UTF-16, since you must combine surrogate pairs first in order to
find out what the canonical decomposition of the code point is;
but it's just more code, and no problem in principle.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to