Ligatures, such as IJ and ij (unicode 0x0132, 0x0133) are considered acceptable identifier characters unless explicitly tailored out. (They appear in both ID and XID)
Do we really want this, or should we assume that ij and ij should be equivalent? If so, then we need to enforce this somehow. To me, this suggests that we should use the NFKD form. Examples at http://www.unicode.org/reports/tr15/tr15-28.html show that only the Decomposition forms split fi (ligature 0xFB01) into the constituents f and i. Kompatibility form is needed to merge characters that are "the same" except for some presentational quirk, such as being superscripted or half-width. The PEP assumes NFC, but I haven't really understood why, unless that is required for compatibility with other systems (in which case, it should be made explicit). -jJ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com