[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Roundup Robot added the comment: New changeset ecc8512b427d by Serhiy Storchaka in branch '3.3': Issue #16741: Fix an error reporting in int(). http://hg.python.org/cpython/rev/ecc8512b427d New changeset 4fd48a807812 by Serhiy Storchaka in branch 'default': Issue #16741: Fix an error reporting in int(). http://hg.python.org/cpython/rev/4fd48a807812 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: There is a test in test_unicode which expects an UnicodeError for int('\ud800'). Now it fails. Should we fix a test or int()? -- resolution: fixed - status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Alexander Belopolsky added the comment: I'd say fix the test. Raising ValueError is correct in this case. UnicodeError was an implementation artifact. -- nosy: +belopolsky ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Roundup Robot added the comment: New changeset 7b023134ad83 by Serhiy Storchaka in branch '3.3': Issue #16741: Remove testing of implementation artifact. http://hg.python.org/cpython/rev/7b023134ad83 New changeset 1b4772ab420f by Serhiy Storchaka in branch 'default': Issue #16741: Remove testing of implementation artifact. http://hg.python.org/cpython/rev/1b4772ab420f -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: If there are no objections I'm going to commit patches soon. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Christian Heimes added the comment: I don't like the idea to change the behavior of 2.7 so late in its release cycle. Benjamin, what's your opinion? -- nosy: +benjamin.peterson, christian.heimes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Benjamin Peterson added the comment: Yeah, let's just fix Python 3. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: Here is a patch for 2.7. -- Added file: http://bugs.python.org/file30916/int_from_str-2.7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: Patch updated. It now reuses code for bytes-int in longobject.c and abstract.c, doesn't raise UnicodeDecodeError for non-utf-8 bytes, and always reports an invalid bytes literal as a bytes object. -- Added file: http://bugs.python.org/file30515/int_from_str-3.3_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: Are there any other comments? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: Thanks, for 3.4 I will use new formatting feature. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Serhiy Storchaka added the comment: Here is a patch based on Matthew's patch. It is smaller (+35 lines vs +59) but fixes error messages for more cases: int(b'123\0') -- bytes string with null without base. int(b'123\xbd') -- non-utf-8 bytes string. int('123\ud800') -- lone surrogate in unicode string. Unfortunately it is not easy to backport it to 2.7. PyErr_Format() in 2.7 works only with null-terminated strings. I propose to fix this issue on 3.3+ and declare it as won't fix for 2.7. -- nosy: +chris.jerdonek versions: -Python 3.2 Added file: http://bugs.python.org/file30141/int_from_str.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
STINNER Victor added the comment: int_from_str.patch: +strobj = PySequence_GetSlice(u, 0, 200); +if (strobj != NULL) { +PyErr_Format(PyExc_ValueError, + invalid literal for int() with base %d: %R, + base, strobj); +Py_DECREF(strobj); +} Oh, it remembers me that #7330 is still open. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Martin Morrison m...@ensoft.co.uk: -- nosy: +isoschiz ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Matthew Barnett added the comment: I've attached a small additional patch for truncating the UTF-8. I don't know whether it's strictly necessary, but I don't know that it's unnecessary either! (Better safe than sorry.) -- Added file: http://bugs.python.org/file28492/issue16741#2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +serhiy.storchaka versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Matthew Barnett added the comment: I've attached a patch. It now reports an invalid literal as-is: int(#\N{ARABIC-INDIC DIGIT ONE}) Traceback (most recent call last): File pyshell#1, line 1, in module int(#\N{ARABIC-INDIC DIGIT ONE}) ValueError: invalid literal for int() with base 10: '#١' int(foo\x00bar) Traceback (most recent call last): File pyshell#2, line 1, in module int(foo\x00bar) ValueError: invalid literal for int() with base 10: 'foo\x00bar' There's a slight difference in that it truncates to 200 codepoints, not 200 UTF-8 bytes. -- keywords: +patch Added file: http://bugs.python.org/file28487/issue16741.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti, haypo stage: - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Matthew Barnett added the comment: It occurred to me that the truncation of the string when building the error message could cause a UnicodeDecodeError: int(1.ljust(199) + \u0100) Traceback (most recent call last): File pyshell#0, line 1, in module int(1.ljust(199) + \u0100) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 199: unexpected end of data This is because it's truncating a UTF-8 string, and the truncation is in the middle of a multi-byte sequence. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Terry J. Reedy tjre...@udel.edu: -- versions: -Python 2.6, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Matthew Barnett added the comment: Python takes a long way round when converting strings to int. It does the following (I'll be talking about Python 3.3 here): 1. In function 'fix_decimal_and_space_to_ascii', the different kinds of spaces are converted to and the different kinds of digits are converted to their equivalents in the ASCII range; 2. The resulting string is converted to UTF-8; 3. The resulting string is passed to 'PyLong_FromString', which expects a null-terminated string. 4. If 'PyLong_FromString' is unable to parse the string as an int, it builds an error message using the string that was passed into it, which it does by converting that string _back_ into Unicode. As a result of step 4, the string that's reported as the value in the error message is _not_ necessarily correct. For example: int(\N{ARABIC-INDIC DIGIT ONE}) 1 int(#\N{ARABIC-INDIC DIGIT ONE}) Traceback (most recent call last): File pyshell#1, line 1, in module int(#\N{ARABIC-INDIC DIGIT ONE}) ValueError: invalid literal for int() with base 10: '#1' And it also means a \x00 and anything after it will be omitted: int(foo\x00bar) Traceback (most recent call last): File pyshell#2, line 1, in module int(foo\x00bar) ValueError: invalid literal for int() with base 10: 'foo' And in a final point, 'PyLong_FromString' limits the length of the value it reports in the error message, and the code that does it includes this line: slen = strlen(orig_str) 200 ? strlen(orig_str) : 200; -- nosy: +mrabarnett ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
New submission from ganges master: I'm not sure if it's a bug or just an inconvenience, but when a string containing \x00 is passed to int/float/etc, they return a misleading exception: int(abc) Traceback (most recent call last): File stdin, line 1, in module ValueError: invalid literal for int() with base 10: 'abc' int(\x00abc) Traceback (most recent call last): File stdin, line 1, in module ValueError: invalid literal for int() with base 10: '' float(\x00abc) Traceback (most recent call last): File stdin, line 1, in module ValueError: could not convert string to float: I noticed the code does actually try to handle it: http://hg.python.org/cpython/file/39803c20c9bf/Objects/intobject.c#l1066 but still, the reported error is very misleading. -- components: Interpreter Core messages: 177863 nosy: gangesmaster priority: normal severity: normal status: open title: `int()`, `float()`, etc think python strings are null-terminated type: behavior versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16741] `int()`, `float()`, etc think python strings are null-terminated
Changes by Benjamin Peterson benja...@python.org: -- nosy: +mark.dickinson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16741 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com