Paul Prescod schrieb: > There is at least one big difference between surrogate pairs and > decomposed characters. The user can typically normalize away > decompositions. How do you normalize away decompositions in a language > that only supports 16-bit representations?
I don't see the problem: You use UTF-16; all normal forms (NFC, NFD, NFKC, NFKD) can be represented in UTF-16 just fine. It is somewhat tricky to implement a normalization algorithm in UTF-16, since you must combine surrogate pairs first in order to find out what the canonical decomposition of the code point is; but it's just more code, and no problem in principle. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com