David Hopwood <[EMAIL PROTECTED]> writes:

> People do need to realize that *all* Unicode encodings are
> variable-length, in the sense that abstract characters can be
> represented by multiple code points.

Unicode algorithms for case mapping, word splitting, collation etc.
are generally defined in terms of code points. Character database is
keyed by code points, which is the largest practical text unit with
a finite domain.

Even if on the high level there are some other units, any algorithm
which determines these high level text boundaries is easier to
implement in terms of code points than in terms of even lower-level
UTF-x code units.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to