On Sun, Jun 4, 2017 at 2:48 AM, Steven D'Aprano <st...@pearwood.info> wrote: > On Sun, Jun 04, 2017 at 02:36:50AM +1000, Steven D'Aprano wrote: > >> But Python 3.5 does treat it as an identifier! >> >> py> ℘ = 1 # should be a SyntaxError ? >> py> ℘ >> 1 >> >> There's a bug here, somewhere, I'm just not sure where... > > That appears to be the only Symbol Math character which is accepted as > an identifier in Python 3.5: > > py> import unicodedata > py> all_unicode = map(chr, range(0x110000)) > py> symbols = [c for c in all_unicode if unicodedata.category(c) == 'Sm'] > py> len(symbols) > 948 > py> ns = {} > py> for c in symbols: > ... try: > ... exec(c + " = 1", ns) > ... except SyntaxError: > ... pass > ... else: > ... print(c, unicodedata.name(c)) > ... > ℘ SCRIPT CAPITAL P > py>
Curious. And not specific to 3.5 - the exact same thing happens in 3.7. Here's the full category breakdown: cats = collections.defaultdict(int) ns = {} for c in map(chr, range(1, 0x110000)): try: exec(c + " = 1", ns) except SyntaxError: pass except UnicodeEncodeError: if unicodedata.category(c) != "Cs": raise else: cats[unicodedata.category(c)] += 1 defaultdict(<class 'int'>, {'Po': 1, 'Lu': 1702, 'Pc': 1, 'Ll': 2063, 'Lo': 112703, 'Lt': 31, 'Lm': 245, 'Nl': 236, 'Mn': 2, 'Sm': 1, 'So': 1}) For reference, as well as the 948 Sm, there are 1690 Mn and 5777 So, but only these characters are valid from them: \u1885 Mn MONGOLIAN LETTER ALI GALI BALUDA \u1886 Mn MONGOLIAN LETTER ALI GALI THREE BALUDA ℘ Sm SCRIPT CAPITAL P ℮ So ESTIMATED SYMBOL 2118 SCRIPT CAPITAL P and 212E ESTIMATED SYMBOL are listed in PropList.txt as Other_ID_Start, so they make sense. But that doesn't explain the two characters from category Mn. It also doesn't explain why U+309B and U+309C are *not* valid, despite being declared Other_ID_Start. Maybe it's a bug? Maybe 309B and 309C somehow got switched into 1885 and 1886?? ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/