Ligatures, such as IJ and ij (unicode 0x0132, 0x0133) are considered
acceptable identifier characters unless explicitly tailored out.
(They appear in both ID and XID)

Do we really want this, or should we assume that ij and ij should be
equivalent?  If so, then we need to enforce this somehow.

To me, this suggests that we should use the NFKD form.  Examples at
http://www.unicode.org/reports/tr15/tr15-28.html show that only the
Decomposition forms split fi (ligature 0xFB01) into the constituents f
and i.  Kompatibility form is needed to merge characters that are "the
same" except for some presentational quirk, such as being
superscripted or half-width.

The PEP assumes NFC, but I haven't really understood why, unless that
is required for compatibility with other systems (in which case, it
should be made explicit).

-jJ
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to