Paul Prescod wrote: > On 9/25/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > >> As David Hopwood pointed out, to be fully correct, you already have to >> create a custom function even with bmp characters, because of >> decomposed characters. (Example: Representing a c-cedilla as a c and >> a combining cedilla, rather than as a single code point.) Separating >> those two would be wrong. Counting them as two characters for slicing >> purposes would usually be wrong. > > Even 32-bit representations are permitted to use surrogate pairs; it > just doesn't often make sense. > > There is at least one big difference between surrogate pairs and decomposed > characters. The user can typically normalize away decompositions.
That depends what script they're using. For some scripts, they can't. -- David Hopwood <[EMAIL PROTECTED]> _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
