Alexander Belopolsky <[email protected]> added the comment:
After a bit of svn archeology, it does appear that Arabic-Indic digits' support
was deliberate at least in the sense that the feature was tested for when the
code was first committed. See r15000.
The test migrated from file to file over the last 10 years, but it is still
present in test_float.py:
self.assertEqual(float(b" \u0663.\u0661\u0664
".decode('raw-unicode-escape')), 3.14)
(It should probably be now rewritten using a string literal.)
I am now attaching the patch (issue10557.diff) that fixes the bug without
sacrificing non-ASCII digit support.
If this approach is well-received, I would like to replace all calls to
PyUnicode_EncodeDecimal() with the calls to the new
_PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented
PyUnicode_EncodeDecimal().
For the future, I note that starting with Unicode 6.0.0, the Unicode Consortium
promises that
"""
Characters with the property value Numeric_Type=de (Decimal) only occur in
contiguous ranges of 10 characters, with ascending numeric values from 0 to 9
(Numeric_Value=0..9).
"""
This makes it very easy to check a numeric string does not contain a mix of
digits from different scripts.
I still believe that proper API should require explicit choice of language or
locale before allowing digits other than 0-9 just as int() would not accept
hexadecimal digits without explicit choice of base >= 16. But this would be a
subject of a feature request.
----------
Added file: http://bugs.python.org/file19865/issue10557.diff
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue10557>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com