Antoine Pitrou <[email protected]> added the comment:
The following code at the beginning of PyUnicode_DecodeUTF32Stateful is buggy
when codec endianness doesn't match the native endianness (not to mention it
could also crash if the underlying CPU arch doesn't support unaligned access to
4-byte integers):
#ifndef Py_UNICODE_WIDE
for (i = pairs = 0; i < size/4; i++)
if (((Py_UCS4 *)s)[i] >= 0x10000)
pairs++;
#endif
As a result, the preallocated unicode object isn't long enough and Python
writes into memory it shouldn't write into. It can produce hard crashes, such
as:
>>> l = unicode(b'\x00\x01\x00\x00' * 1024, 'utf-32be')
Debug memory block at address p=0xf2b310:
2050 bytes originally requested
The 8 pad bytes at p-8 are FORBIDDENBYTE, as expected.
The 8 pad bytes at tail=0xf2bb12 are not all FORBIDDENBYTE (0xfb):
at tail+0: 0x00 *** OUCH
at tail+1: 0xdc *** OUCH
at tail+2: 0x00 *** OUCH
at tail+3: 0xd8 *** OUCH
at tail+4: 0x00 *** OUCH
at tail+5: 0xdc *** OUCH
at tail+6: 0x00 *** OUCH
at tail+7: 0xd8 *** OUCH
The block was made by call #61925422603698392 to debug malloc/realloc.
Data at p: 00 d8 00 dc 00 d8 00 dc ... 00 dc 00 d8 00 dc 00 d8
Fatal Python error: bad trailing pad byte
Abandon
----------
priority: high -> critical
type: behavior -> crash
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8941>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com