[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-14 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 73da4fd7542b by Serhiy Storchaka in branch '3.4':
Issue #25388: Fixed tokenizer crash when processing undecodable source code
https://hg.python.org/cpython/rev/73da4fd7542b

New changeset e4a69eb34ad7 by Serhiy Storchaka in branch '3.5':
Issue #25388: Fixed tokenizer crash when processing undecodable source code
https://hg.python.org/cpython/rev/e4a69eb34ad7

New changeset ea0c4b811eae by Serhiy Storchaka in branch 'default':
Issue #25388: Fixed tokenizer crash when processing undecodable source code
https://hg.python.org/cpython/rev/ea0c4b811eae

New changeset 8e472cc258ec by Serhiy Storchaka in branch '2.7':
Issue #25388: Fixed tokenizer hang when processing undecodable source code
https://hg.python.org/cpython/rev/8e472cc258ec

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yes, there is a bug. When decoding_fgets() encounter non-UTF-8 bytes, it fails 
and free input buffer in error_ret(). But since tok->cur != tok->inp, next call 
of tok_nextc() reads freed memory.

if (tok->cur != tok->inp) {
return Py_CHARMASK(*tok->cur++); /* Fast path */
}

If Python is not crashed here, new buffer is allocated and assigned to 
tok->buf, then PyTokenizer_Get returns error, parsetok() calculates the 
position of the error

err_ret->offset = (int)(tok->cur - tok->buf);

but tok->cur points inside old freed buffer, and the offset becomes too large 
integer. err_input() tries to decode the part of the string before error with 
the "replace" error handler, but since the position was wrongly calculated, it 
reads out of allocated memory.

Proposed patch fixes the issue. It sets tok->done and pointers in case of 
decoding error, so they now are in consistent state. It also removes some 
duplicated or dead code.

--
stage:  -> patch review
Added file: http://bugs.python.org/file40965/issue25388.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-03 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-03 Thread Brian Cain

Brian Cain added the comment:

Sorry, the report would have been clearer if I'd included a build with symbols 
and a stack trace.

The test was inspired by the test from issue24022 
(https://hg.python.org/cpython/rev/03b2259c6cd3), it sounds like it should not 
have been.

But indeed it seems like you've reproduced this issue, and you agree it's a bug?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-03 Thread Brian Cain

Brian Cain added the comment:

Here is a more useful ASan report:

=
==12168==ERROR: AddressSanitizer: heap-use-after-free on address 0x6251e110 
at pc 0x00697238 bp 0x7fff412b9240 sp 0x7fff412b9238
READ of size 1 at 0x6251e110 thread T0
#0 0x697237 in tok_nextc 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:911:20
#1 0x68c63b in tok_get 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:1460:13
#2 0x689d93 in PyTokenizer_Get 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:1809:18
#3 0x67fec3 in parsetok 
/home/brian/src/fuzzpy/cpython/Parser/parsetok.c:208:16
#4 0x6837d4 in PyParser_ParseFileObject 
/home/brian/src/fuzzpy/cpython/Parser/parsetok.c:134:12
#5 0x52f50c in PyParser_ASTFromFileObject 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:1150:15
#6 0x532e16 in PyRun_FileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:916:11
#7 0x52c3f8 in PyRun_SimpleFileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:396:13
#8 0x52a460 in PyRun_AnyFileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:80:16
#9 0x5cb04a in run_file /home/brian/src/fuzzpy/cpython/Modules/main.c:318:11
#10 0x5c5a42 in Py_Main /home/brian/src/fuzzpy/cpython/Modules/main.c:768:19
#11 0x4fbace in main 
/home/brian/src/fuzzpy/cpython/./Programs/python.c:69:11
#12 0x7fe8a9a4aa3f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x20a3f)
#13 0x431548 in _start (/home/brian/src/fuzzpy/cpython/python+0x431548)

0x6251e110 is located 16 bytes inside of 8224-byte region 
[0x6251e100,0x62520120)
freed by thread T0 here:
#0 0x4cdef0 in realloc 
/home/brian/src/fuzzpy/llvm_src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:61
#1 0x501280 in _PyMem_RawRealloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:84:12
#2 0x4fc68d in _PyMem_DebugRealloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:1921:18
#3 0x4fdf42 in PyMem_Realloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:343:12
#4 0x69a338 in tok_nextc 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:1050:34
#5 0x68a2c9 in tok_get 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:1357:17
#6 0x689d93 in PyTokenizer_Get 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:1809:18
#7 0x67fec3 in parsetok 
/home/brian/src/fuzzpy/cpython/Parser/parsetok.c:208:16
#8 0x6837d4 in PyParser_ParseFileObject 
/home/brian/src/fuzzpy/cpython/Parser/parsetok.c:134:12
#9 0x52f50c in PyParser_ASTFromFileObject 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:1150:15
#10 0x532e16 in PyRun_FileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:916:11
#11 0x52c3f8 in PyRun_SimpleFileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:396:13
#12 0x52a460 in PyRun_AnyFileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:80:16
#13 0x5cb04a in run_file 
/home/brian/src/fuzzpy/cpython/Modules/main.c:318:11
#14 0x5c5a42 in Py_Main /home/brian/src/fuzzpy/cpython/Modules/main.c:768:19
#15 0x4fbace in main 
/home/brian/src/fuzzpy/cpython/./Programs/python.c:69:11
#16 0x7fe8a9a4aa3f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x20a3f)

previously allocated by thread T0 here:
#0 0x4cdb88 in malloc 
/home/brian/src/fuzzpy/llvm_src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:40
#1 0x501030 in _PyMem_RawMalloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:62:12
#2 0x5074db in _PyMem_DebugAlloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:1838:22
#3 0x4fc213 in _PyMem_DebugMalloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:1861:12
#4 0x4fdbfa in PyMem_Malloc 
/home/brian/src/fuzzpy/cpython/Objects/obmalloc.c:325:12
#5 0x68791d in PyTokenizer_FromFile 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:861:29
#6 0x68359e in PyParser_ParseFileObject 
/home/brian/src/fuzzpy/cpython/Parser/parsetok.c:126:16
#7 0x52f50c in PyParser_ASTFromFileObject 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:1150:15
#8 0x532e16 in PyRun_FileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:916:11
#9 0x52c3f8 in PyRun_SimpleFileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:396:13
#10 0x52a460 in PyRun_AnyFileExFlags 
/home/brian/src/fuzzpy/cpython/Python/pythonrun.c:80:16
#11 0x5cb04a in run_file 
/home/brian/src/fuzzpy/cpython/Modules/main.c:318:11
#12 0x5c5a42 in Py_Main /home/brian/src/fuzzpy/cpython/Modules/main.c:768:19
#13 0x4fbace in main 
/home/brian/src/fuzzpy/cpython/./Programs/python.c:69:11
#14 0x7fe8a9a4aa3f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x20a3f)

SUMMARY: AddressSanitizer: heap-use-after-free 
/home/brian/src/fuzzpy/cpython/Parser/tokenizer.c:911:20 in tok_nextc
Shadow bytes around the buggy address:
  0x0c4a7fffbbd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbbe0: fa 

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-11-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Stack trace:

#0  ascii_decode (start=0xa72f2008 "", end=0xf891 , dest=) at 
Objects/unicodeobject.c:4795
#1  0x08100c0f in PyUnicode_DecodeUTF8Stateful (s=s@entry=0xa72f2008 "", 
size=size@entry=1490081929, errors=errors@entry=0x81f4303 "replace", 
consumed=consumed@entry=0x0)
at Objects/unicodeobject.c:4871
#2  0x081029c7 in PyUnicode_DecodeUTF8 (s=0xa72f2008 "", size=1490081929, 
errors=errors@entry=0x81f4303 "replace") at Objects/unicodeobject.c:4743
#3  0x0815179a in err_input (err=0xbfffec04) at Python/pythonrun.c:1352
#4  0x081525cf in PyParser_ASTFromFileObject (arena=0x8348118, errcode=0x0, 
flags=, ps2=0x0, ps1=0x0, start=257, enc=0x0, 
filename=0xb7950e00, fp=0x8347fb0)
at Python/pythonrun.c:1163
#5  PyRun_FileExFlags (fp=0x8347fb0, filename_str=0xb79e2eb8 "vuln.py", 
start=257, globals=0xb79e3d8c, locals=0xb79e3d8c, closeit=1, flags=0xbfffecec) 
at Python/pythonrun.c:916
#6  0x08152744 in PyRun_SimpleFileExFlags (fp=0x8347fb0, filename=, closeit=1, flags=0xbfffecec) at Python/pythonrun.c:396
#7  0x08063919 in run_file (p_cf=0xbfffecec, filename=0x82eda10 L"vuln.py", 
fp=0x8347fb0) at Modules/main.c:318
#8  Py_Main (argc=argc@entry=2, argv=argv@entry=0x82ed008) at Modules/main.c:768
#9  0x0805f345 in main (argc=2, argv=0xbfffee44) at ./Programs/python.c:69

At #2 PyUnicode_DecodeUTF8 is called with s="" and size=1490081929. size is 
err->offset, and err->offset is set only in parsetok() in Parser/parsetok.c. 
This is the tokenizer bug.

Minimal reproducer:

./python -c 'with open("vuln.py", "wb") as f: f.write(b"\x7f\x00\n\xfd\n")
./python vuln.py

The crash is gone if comment out the code at the end of decoding_fgets() that 
tests UTF-8.

--
nosy: +benjamin.peterson, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-10-16 Thread Terry J. Reedy

Terry J. Reedy added the comment:

According to 
https://docs.python.org/3/reference/lexical_analysis.html#lexical-analysis, the 
encoding of a sourcefile (in Python 3) defaults to utf-8* and a decoding error 
is (should be) reported as a SyntaxError. Since 
b"\x7f\x00\x00\n''s\x01\xfd\n'S" is not invalid as utf-8, I expect a 
UnicodeDecodeError converted to SyntaxError.

* compile(bytes, filename, mode) defaults to latin1 instead.  It has no 
decoding problem, but quits with "ValueError: source code string cannot contain 
null bytes".  On 2.7, I might expect that as the error.

I expect '''self.assertIn(b"Non-UTF-8", res.err)''' to always fail because 
error messages are strings, not bytes.  That aside, have you ever seen that 
particular text (as a string) in a SyntaxError message?).

Why do you think the crash is during the tokenizing phase?  I could not see 
anything in the AS report.

--
nosy: +terry.reedy
versions: +Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-10-12 Thread Brian Cain

Changes by Brian Cain :


--
type:  -> crash

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25388] tokenizer crash/misbehavior -- heap use-after-free

2015-10-12 Thread Brian Cain

Changes by Brian Cain :


--
title: tokenizer crash/misbehavior -> tokenizer crash/misbehavior -- heap 
use-after-free

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com