On 04/06/17 00:04, Chris Angelico wrote: > On Sun, Jun 4, 2017 at 5:02 AM, Thomas Jollans <t...@tjol.eu> wrote: >> On 03/06/17 20:41, Chris Angelico wrote: >>> [snip] >>> For reference, as well as the 948 Sm, there are 1690 Mn and 5777 So, >>> but only these characters are valid from them: >>> >>> \u1885 Mn MONGOLIAN LETTER ALI GALI BALUDA >>> \u1886 Mn MONGOLIAN LETTER ALI GALI THREE BALUDA >>> ℘ Sm SCRIPT CAPITAL P >>> ℮ So ESTIMATED SYMBOL >>> >>> 2118 SCRIPT CAPITAL P and 212E ESTIMATED SYMBOL are listed in >>> PropList.txt as Other_ID_Start, so they make sense. But that doesn't >>> explain the two characters from category Mn. It also doesn't explain >>> why U+309B and U+309C are *not* valid, despite being declared >>> Other_ID_Start. Maybe it's a bug? Maybe 309B and 309C somehow got >>> switched into 1885 and 1886?? >> \u1885 and \u1886 are categorised as letters (category Lo) by my Python >> 3.5. (Which makes sense, right?) If your system puts them in category >> Mn, that's bound to be a bug somewhere. > rosuav@sikorsky:~$ python3.7 -c "import unicodedata; > print(unicodedata.unidata_version, unicodedata.category('\u1885'))" > 9.0.0 Mn > rosuav@sikorsky:~$ python3.6 -c "import unicodedata; > print(unicodedata.unidata_version, unicodedata.category('\u1885'))" > 8.0.0 Lo > rosuav@sikorsky:~$ python3.5 -c "import unicodedata; > print(unicodedata.unidata_version, unicodedata.category('\u1885'))" > 8.0.0 Lo > rosuav@sikorsky:~$ python3.4 -c "import unicodedata; > print(unicodedata.unidata_version, unicodedata.category('\u1885'))" > 6.3.0 Lo > > Is it possible that there's a discrepancy between the Unicode version > used by the unicodedata module and the one used by the parser?
It appear to be Unicode policy to keep characters in ID_Start (etc) even if this no longer fits their character category. So in Unicode 9.0, 1885 and 1886 were added to Other_ID_Start for backwards compatibility (like ℘). Thomas _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/