> Currently, in Python 2.5, identifiers are specified as starting with > an upper- or lowercase letter or underscore ('_') with the following > "characters" of the identifier also optionally being a numerical digit > ("0"..."9"). > > This current state seems easy to remember even if felt restrictive by > many. > > Contrawise, the referenced document "UAX-31" is a bit obscure to me
It's actually very easy. The basic principle will stay: the first character must be a letter or an underscore, followed by letters, underscores, and digits. The question really is "what is a letter"? what is an underscore? what is a digit? > 1) Will this allow me to use, say, a "right-arrow" glyph (if I can > find one) to start my identifier? No. A right-arrow (such as U+2192, RIGHTWARDS ARROW) is a symbol (general category Sm: Symbol, Math). See http://unicode.org/Public/UNIDATA/UCD.html for a list of general category values, and http://unicode.org/Public/UNIDATA/UnicodeData.txt for a textual description of all characters. Now, there is a special case in that Unicode supports "combining modifier characters", i.e. characters that are not characters themselves, but modify previous characters, to add diacritical marks to letters. Unicode has great flexibility in applying these, to form characters that are not supported themselves. Among those, there is U+20D7, COMBINING RIGHT ARROW ABOVE, which is of general category Mn, Mark, Nonspacing. In PEP 3131, such marks may not appear as the first character (since they need to modify a base character), but as subsequent characters. This allows you to form identifiers such as v⃗ (which should render as a small letter v, with an vector arrow on top). > 2) Could an ``ID_Continue`` be used as an ``ID_Start`` if using a RTL > (reversed or "mirrored") identifier? (Probably not, but I don't know.) Unicode, and this PEP, always uses logical order, not rendering order. What matters is in what order the characters appear in the source code string. RTL languages do pose a challenge, in particular since bidirectional algorithms apparently aren't implemented correctly in many editors. > 3) Is or will there be a definitive and exhaustive listing (with > bitmap representations of the glyphs to avoid the font issues) of the > glyphs that the PEP 3131 would allow in identifiers? (Does this > question even make sense?) It makes sense, but it is difficult to implement. The PEP already links to a non-normative list that is exhaustive for Unicode 4.1. Future Unicode versions may add additional characters, so the a list that is exhaustive now might not be in the future. The Unicode consortium promises stability, meaning that what is an identifier now won't be reclassified as a non-identifier in the future, but the reverse is not true, as new code points get assigned. As for the list I generated in HTML: It might be possible to make it include bitmaps instead of HTML character references, but doing so is a licensing problem, as you need a license for a font that has all these characters. If you want to lookup a specific character, I recommend to go to the Unicode code charts, at http://www.unicode.org/charts/ Notice that an HTML page that includes individual bitmaps for all characters would take *ages* to load. Regards, Martin P.S. Anybody who wants to play with generating visualisations of the PEP, here are the functions I used: def isnorm(c): return unicodedata.normalize("NFC", c) def start(c): if not isnorm(c): return False if unicodedata.category(c) in ('Ll', 'Lt', 'Lm', 'Lo', 'Nl'): return True if c==u'_': return True if c in u"\u2118\u212E\u309B\u309C": return True return False def cont_only(c): if not isnorm(c): return False if unicodedata.category(c) in ('Mn', 'Mc', 'Nd', 'Pc'): return True if 0x1369 <= ord(c) <= 0x1371: return True return False def cont(c): return start(c) or cont_only(c) The isnorm() aspect excludes characters from the list which change under NFC. This excludes a few compatibility characters which are allowed in source code, but become indistinguishable from their canonical form semantically. -- http://mail.python.org/mailman/listinfo/python-list