Steven D'Aprano <steve+pyt...@pearwood.info> added the comment: I think that analysis is wrong. The Wikipedia page describes the meaning of the Unicode Decimal/Digit/Numeric properties:
https://en.wikipedia.org/wiki/Unicode_character_property#Numeric_values_and_types and the characters you show aren't appropriate for converting to ints: py> for c in '一二三四五': ... print(unicodedata.name(c)) ... CJK UNIFIED IDEOGRAPH-4E00 CJK UNIFIED IDEOGRAPH-4E8C CJK UNIFIED IDEOGRAPH-4E09 CJK UNIFIED IDEOGRAPH-56DB CJK UNIFIED IDEOGRAPH-4E94 The first one, for example, is translated as "one; a, an; alone"; it is better read as the *word* one rather than the numeral 1. (Disclaimer: I am not a Chinese speaker and I welcome correction from an expert.) Likewise U+4E8C, translated as "two; twice". The blog post is factually wrong when it claims: "str.isdigit only returns True for what I said before, strings containing solely the digits 0-9." py> s = "\N{BENGALI DIGIT ONE}\N{BENGALI DIGIT TWO}" py> s.isdigit() True py> int(s) 12 So I think that there's nothing to do here (unless it is perhaps to add a FAQ about it, or improve the docs). ---------- nosy: +steven.daprano _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36100> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com