Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:

I think that analysis is wrong. The Wikipedia page describes the meaning of the 
Unicode Decimal/Digit/Numeric properties:

https://en.wikipedia.org/wiki/Unicode_character_property#Numeric_values_and_types

and the characters you show aren't appropriate for converting to ints:

py> for c in '一二三四五':
...     print(unicodedata.name(c))
...
CJK UNIFIED IDEOGRAPH-4E00
CJK UNIFIED IDEOGRAPH-4E8C
CJK UNIFIED IDEOGRAPH-4E09
CJK UNIFIED IDEOGRAPH-56DB
CJK UNIFIED IDEOGRAPH-4E94

The first one, for example, is translated as "one; a, an; alone"; it is better 
read as the *word* one rather than the numeral 1. (Disclaimer: I am not a 
Chinese speaker and I welcome correction from an expert.)

Likewise U+4E8C, translated as "two; twice".

The blog post is factually wrong when it claims:

"str.isdigit only returns True for what I said before, strings containing 
solely the digits 0-9."

py> s = "\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}"
py> s.isdigit()
True
py> int(s)
12

So I think that there's nothing to do here (unless it is perhaps to add a FAQ 
about it, or improve the docs).

----------
nosy: +steven.daprano

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36100>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to