Re: [Python-ideas] π = math.pi

Chris Angelico Sat, 03 Jun 2017 15:04:51 -0700

On Sun, Jun 4, 2017 at 5:02 AM, Thomas Jollans <[email protected]> wrote:
> On 03/06/17 20:41, Chris Angelico wrote:
>> [snip]
>> For reference, as well as the 948 Sm, there are 1690 Mn and 5777 So,
>> but only these characters are valid from them:
>>
>> \u1885 Mn MONGOLIAN LETTER ALI GALI BALUDA
>> \u1886 Mn MONGOLIAN LETTER ALI GALI THREE BALUDA
>> ℘ Sm SCRIPT CAPITAL P
>> ℮ So ESTIMATED SYMBOL
>>
>> 2118 SCRIPT CAPITAL P and 212E ESTIMATED SYMBOL are listed in
>> PropList.txt as Other_ID_Start, so they make sense. But that doesn't
>> explain the two characters from category Mn. It also doesn't explain
>> why U+309B and U+309C are *not* valid, despite being declared
>> Other_ID_Start. Maybe it's a bug? Maybe 309B and 309C somehow got
>> switched into 1885 and 1886??
>
> \u1885 and \u1886 are categorised as letters (category Lo) by my Python
> 3.5. (Which makes sense, right?) If your system puts them in category
> Mn, that's bound to be a bug somewhere.


rosuav@sikorsky:~$ python3.7 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
9.0.0 Mn
rosuav@sikorsky:~$ python3.6 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
8.0.0 Lo
rosuav@sikorsky:~$ python3.5 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
8.0.0 Lo
rosuav@sikorsky:~$ python3.4 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
6.3.0 Lo

Is it possible that there's a discrepancy between the Unicode version
used by the unicodedata module and the one used by the parser?

ChrisA
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] π = math.pi

Reply via email to