New submission from Marc-Andre Lemburg <m...@egenix.com>: The script only patches numeric data into the table (field 8), but does not update the digit field (field 7).
As a result, ideographs used for Chinese digits are not recognized as digits and not evaluated by int(), long() and float(): http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture >>> unicode('三', 'utf-8') u'\u4e09' >>> int(unicode('三', 'utf-8')) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'decimal' codec can't encode character u'\u4e09' in position 0: invalid decimal Unicode string > <stdin>(1)<module>() >>> import unicodedata >>> unicodedata.digit(unicode('三', 'utf-8')) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: not a digit The code point refers to the digit 3. ---------- components: Unicode messages: 122786 nosy: lemburg priority: normal severity: normal status: open title: makeunicodedata.py does not support Unihan digit data versions: Python 2.7, Python 3.2, Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10575> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com