Fredrik Lundh wrote: > David Hopwood wrote: > >>For example, "ö" can be represented either as the precomposed character >>U+00F6, >>or as "o" followed by a combining diaeresis (U+006F U+0308). > > normalization is a good thing, though: > > http://www.w3.org/TR/charmod-norm/ > > (it would probably be a good idea to turn unicodedata.normalize into a > method for the new unicode string type).
Normalization is certainly a good thing to support. But that's orthogonal to my point above -- that some abstract characters are representable by sequences of more than one code point, which must not be split, and that avoidance of such splitting automatically also avoids splitting within a code point representation. Note that some abstract characters needed for living languages are representable *only* by combining sequences. -- David Hopwood <[EMAIL PROTECTED]> _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com