[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-14 Thread Roundup Robot
Roundup Robot added the comment: New changeset 73da4fd7542b by Serhiy Storchaka in branch '3.4': Issue #25388: Fixed tokenizer crash when processing undecodable source code https://hg.python.org/cpython/rev/73da4fd7542b New changeset e4a69eb34ad7 by Serhiy Storchaka in branch '3.5': Issue

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Yes, there is a bug. When decoding_fgets() encounter non-UTF-8 bytes, it fails and free input buffer in error_ret(). But since tok->cur != tok->inp, next call of tok_nextc() reads freed memory. if (tok->cur != tok->inp) { return

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-03 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- assignee: -> serhiy.storchaka ___ Python tracker ___

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-03 Thread Brian Cain
Brian Cain added the comment: Sorry, the report would have been clearer if I'd included a build with symbols and a stack trace. The test was inspired by the test from issue24022 (https://hg.python.org/cpython/rev/03b2259c6cd3), it sounds like it should not have been. But indeed it seems

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-03 Thread Brian Cain
Brian Cain added the comment: Here is a more useful ASan report: = ==12168==ERROR: AddressSanitizer: heap-use-after-free on address 0x6251e110 at pc 0x00697238 bp 0x7fff412b9240 sp 0x7fff412b9238 READ of size 1 at

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Stack trace: #0 ascii_decode (start=0xa72f2008 "", end=0xf891 , dest=) at Objects/unicodeobject.c:4795 #1 0x08100c0f in PyUnicode_DecodeUTF8Stateful (s=s@entry=0xa72f2008 "", size=size@entry=1490081929, errors=errors@entry=0x81f4303 "replace",

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-10-16 Thread Terry J. Reedy
Terry J. Reedy added the comment: According to https://docs.python.org/3/reference/lexical_analysis.html#lexical-analysis, the encoding of a sourcefile (in Python 3) defaults to utf-8* and a decoding error is (should be) reported as a SyntaxError. Since b"\x7f\x00\x00\n''s\x01\xfd\n'S" is

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-10-12 Thread Brian Cain
Changes by Brian Cain : -- type: -> crash ___ Python tracker ___ ___ Python-bugs-list

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-10-12 Thread Brian Cain
Changes by Brian Cain : -- title: tokenizer crash/misbehavior -> tokenizer crash/misbehavior -- heap use-after-free ___ Python tracker