On Sat, Jun 03, 2017 at 03:51:50PM +0300, Serhiy Storchaka wrote: > The rule for Python identifiers already is not easy, there is no simple > regular expression for them, and I'm sure most tools proceeding Python > sources (even the tokenize module and IDLE) do not handle all Python > identifier correctly. For example they don't recognize the symbol ℘ > (U+2118, SCRIPT CAPITAL P) as a valid identifier.
They shouldn't, because it isn't a valid identifier: it's a Maths Symbol, not a letter, same as ∑ √ ∫ ∞ etc. https://en.wikipedia.org/wiki/Weierstrass_p py> unicodedata.category('℘') 'Sm' But Python 3.5 does treat it as an identifier! py> ℘ = 1 # should be a SyntaxError ? py> ℘ 1 There's a bug here, somewhere, I'm just not sure where... The PEP for non-ASCII identifiers is quite old now (it was written for Unicode 4!) but it excludes category 'Sm' in its identifier algorithm: https://www.python.org/dev/peps/pep-3131/#id16 -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/