Matthew Barnett added the comment:
Python takes a long way round when converting strings to int. It does the
following (I'll be talking about Python 3.3 here):
1. In function 'fix_decimal_and_space_to_ascii', the different kinds of spaces
are converted to " " and the different kinds of digits are converted to their
equivalents in the ASCII range;
2. The resulting string is converted to UTF-8;
3. The resulting string is passed to 'PyLong_FromString', which expects a
null-terminated string.
4. If 'PyLong_FromString' is unable to parse the string as an int, it builds an
error message using the string that was passed into it, which it does by
converting that string _back_ into Unicode.
As a result of step 4, the string that's reported as the value in the error
message is _not_ necessarily correct.
For example:
>>> int("\N{ARABIC-INDIC DIGIT ONE}")
1
>>> int("#\N{ARABIC-INDIC DIGIT ONE}")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
int("#\N{ARABIC-INDIC DIGIT ONE}")
ValueError: invalid literal for int() with base 10: '#1'
And it also means a "\x00" and anything after it will be omitted:
>>> int("foo\x00bar")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
int("foo\x00bar")
ValueError: invalid literal for int() with base 10: 'foo'
And in a final point, 'PyLong_FromString' limits the length of the value it
reports in the error message, and the code that does it includes this line:
slen = strlen(orig_str) < 200 ? strlen(orig_str) : 200;
----------
nosy: +mrabarnett
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue16741>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com