[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-31 Thread Ned Deily

Ned Deily  added the comment:


New changeset 86fdad093b863db7ef6a3a00c9cff724c09442e7 by Ned Deily (Xiang 
Zhang) in branch '3.7':
bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)
https://github.com/python/cpython/commit/86fdad093b863db7ef6a3a00c9cff724c09442e7


--
nosy: +ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-31 Thread Xiang Zhang

Change by Xiang Zhang :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-31 Thread Xiang Zhang

Xiang Zhang  added the comment:


New changeset ea94fce6960d90fffeeda131e31024617912d231 by Xiang Zhang in branch 
'3.6':
[3.6] bpo-32583: Fix possible crashing in builtin Unicode decoders (GH-5325) 
(#5459)
https://github.com/python/cpython/commit/ea94fce6960d90fffeeda131e31024617912d231


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-31 Thread Xiang Zhang

Change by Xiang Zhang :


--
pull_requests: +5285

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-31 Thread Xiang Zhang

Xiang Zhang  added the comment:


New changeset 2c7fd46e11333ef5e5cce34212f7d087694f3658 by Xiang Zhang in branch 
'master':
bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)
https://github.com/python/cpython/commit/2c7fd46e11333ef5e5cce34212f7d087694f3658


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-25 Thread Xiang Zhang

Change by Xiang Zhang :


--
pull_requests: +5170

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-21 Thread Xiang Zhang

Xiang Zhang  added the comment:

I write a draft patch, without tests yet. I'll add them later. Reviews are 
appreciated. I also check the Windows codepage equivalent and encoders, look to 
me they don't suffer the problem.

--
keywords: +patch
stage: needs patch -> patch review
Added file: https://bugs.python.org/file47399/issue32583.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-20 Thread Xiang Zhang

Xiang Zhang  added the comment:

Another way to crash:

>>> import codecs
>>> def replace_with_longer(exc):
... exc.object = b'\xa0\x00' * 100
... return ('\ufffd', exc.end)
...
>>> codecs.register
codecs.register(   codecs.register_error(
>>> codecs.register_error('replace_with_longer', rep
replace_with_longer( repr(
>>> codecs.register_error('replace_with_longer', replace_with_longer)
>>> b'\xd8\xd8'.decode('utf-16-le', 'replace_with_longer')
Debug memory block at address p=0x10b3b8c40: API 'o'
92 bytes originally requested
The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
The 8 pad bytes at tail=0x10b3b8c9c are not all FORBIDDENBYTE (0xfb):
at tail+0: 0xa0 *** OUCH
at tail+1: 0x00 *** OUCH
at tail+2: 0xa0 *** OUCH
at tail+3: 0x00 *** OUCH
at tail+4: 0xa0 *** OUCH
at tail+5: 0x00 *** OUCH
at tail+6: 0xa0 *** OUCH
at tail+7: 0x00 *** OUCH
The block was made by call #11529390970613309440 to debug malloc/realloc.
Data at p: 00 00 00 00 00 00 00 00 ... 00 00 00 00 fd ff a0 00

Fatal Python error: bad trailing pad byte

Current thread 0x7fffab9b4340 (most recent call first):
  File "/Users/angwer/Repositories/cpython/Lib/encodings/utf_16_le.py", line 16 
in decode
  File "", line 1 in 
[1]64081 abort  ~/Repositories/cpython/python.exe

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-20 Thread Xiang Zhang

Change by Xiang Zhang :


--
stage: patch review -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-20 Thread Xiang Zhang

Xiang Zhang  added the comment:

The problem is utf16 decoder almost always assumes that two bytes decodes to 
one unicode character, so when allocating memory, it assumes (bytes_number+1)/2 
unicode slots is enough, there is even a comment in the code. And in 
unicode_decode_call_errorhandler_writer, it only allocates more memory when the 
error handler returns a unicode longer than 1, but doesn't take care pace by 
one, in which case one byte to one unicode character. So it's possible for the 
decoder to write out of bound.

This example could steadily crash on my Mac with debug version, it writes 
across the bound of the internal unicode buffer:

>>> import codecs
>>> def pace_by_one(exc):
... return ('\ufffd', exc.start+1)
...
>>> codecs.register_error('pace_by_one', pace_by_one)
>>> b'\xd8\xd8\xd8\xd8\xd8\xd8\x00\x00\x00'.decode('utf-16-le', 'pace_by_one')
Debug memory block at address p=0x10210c260: API 'o'
100 bytes originally requested
The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
The 8 pad bytes at tail=0x10210c2c4 are not all FORBIDDENBYTE (0xfb):
at tail+0: 0x00 *** OUCH
at tail+1: 0x00 *** OUCH
at tail+2: 0xfb
at tail+3: 0xfb
at tail+4: 0xfb
at tail+5: 0xfb
at tail+6: 0xfb
at tail+7: 0xfb
The block was made by call #30672 to debug malloc/realloc.
Data at p: 00 00 00 00 00 00 00 00 ... fd ff fd ff fd ff d8 00

Fatal Python error: bad trailing pad byte

Current thread 0x7fffab9b4340 (most recent call first):
  File "/Users/angwer/Repositories/cpython/Lib/encodings/utf_16_le.py", line 16 
in decode
  File "", line 1 in 
[1]63997 abort  ~/Repositories/cpython/python.exe

I'll try to make a fix tomorrow.

--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-20 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
components: +Interpreter Core
nosy: +serhiy.storchaka
stage: test needed -> patch review
versions: +Python 3.6 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-19 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

As written, decode_crash.py crashes on Windows also.  Passing 'replace' instead 
of 'w3lib_replace' results in no crash and lots of boxes and blanks.

--
nosy: +benjamin.peterson, ezio.melotti, lemburg, terry.reedy, vstinner
stage:  -> test needed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-17 Thread Alexander Sibiryakov

New submission from Alexander Sibiryakov :

The CPython interpreter gets SIGSEGV or SIGABRT during the run. The script 
attempts to decode binary file using UTF-16-LE encoding and custom error 
handler. The error handler is poorly built, and doesn't respect the unicode 
standard with wrong calculation of the new position for decoder to continue. 
This somehow interfere with internal C code doing memory allocation. The result 
is invalid writes outside of allocated block.

Here is how it looks like with Python 3.7.0a4+ (heads/master:44a70e9, Jan 17 
2018, 12:18:45) run under Valgrind 3.11.0. Please see the full Valgrind output 
in attached valgrind.log.

==24836== Invalid write of size 4
==24836==at 0x4C6B17: ucs4lib_utf16_decode (codecs.h:540)
==24836==by 0x4C6B17: PyUnicode_DecodeUTF16Stateful (unicodeobject.c:5600)
==24836==by 0x55AAD3: _codecs_utf_16_le_decode_impl (_codecsmodule.c:363)
==24836==by 0x55AB6C: _codecs_utf_16_le_decode (_codecsmodule.c.h:371)
==24836==by 0x4315D6: _PyMethodDef_RawFastCallKeywords (call.c:651)
==24836==by 0x431840: _PyCFunction_FastCallKeywords (call.c:730)
==24836==by 0x4ED159: call_function (ceval.c:4580)
==24836==by 0x4ED159: _PyEval_EvalFrameDefault (ceval.c:3134)
==24836==by 0x4E302D: PyEval_EvalFrameEx (ceval.c:545)
==24836==by 0x4E3A42: _PyEval_EvalCodeWithName (ceval.c:3971)
==24836==by 0x430EDD: _PyFunction_FastCallDict (call.c:376)
==24836==by 0x4336B0: PyObject_Call (call.c:226)
==24836==by 0x433839: PyEval_CallObjectWithKeywords (call.c:826)
==24836==by 0x4FEAA6: _PyCodec_DecodeInternal (codecs.c:471)
==24836==  Address 0x6cf4bf8 is 0 bytes after a block of size 339,112 alloc'd
==24836==at 0x4C2DB8F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24836==by 0x467635: _PyMem_RawMalloc (obmalloc.c:75)
==24836==by 0x467B7D: _PyMem_DebugRawAlloc (obmalloc.c:2033)
==24836==by 0x467C1F: _PyMem_DebugRawMalloc (obmalloc.c:2062)
==24836==by 0x467C40: _PyMem_DebugMalloc (obmalloc.c:2202)
==24836==by 0x468BFF: PyObject_Malloc (obmalloc.c:616)
==24836==by 0x493902: PyUnicode_New (unicodeobject.c:1293)
==24836==by 0x4BEA4F: _PyUnicodeWriter_PrepareInternal 
(unicodeobject.c:13456)
==24836==by 0x4C6D39: _PyUnicodeWriter_WriteCharInline 
(unicodeobject.c:13494)
==24836==by 0x4C6D39: PyUnicode_DecodeUTF16Stateful (unicodeobject.c:5637)
==24836==by 0x55AAD3: _codecs_utf_16_le_decode_impl (_codecsmodule.c:363)
==24836==by 0x55AB6C: _codecs_utf_16_le_decode (_codecsmodule.c.h:371)
==24836==by 0x4315D6: _PyMethodDef_RawFastCallKeywords (call.c:651)

--
Added file: https://bugs.python.org/file47393/valgrind.log

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-17 Thread Alexander Sibiryakov

Change by Alexander Sibiryakov :


Added file: https://bugs.python.org/file47392/test_string.bin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32583] Crash during decoding using UTF-16/32 and custom error handler

2018-01-17 Thread Alexander Sibiryakov

Change by Alexander Sibiryakov :


--
files: decode_crash.py
nosy: sibiryakov
priority: normal
severity: normal
status: open
title: Crash during decoding using UTF-16/32 and custom error handler
type: crash
versions: Python 3.5, Python 3.7
Added file: https://bugs.python.org/file47391/decode_crash.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com