Antoine Pitrou <pit...@free.fr> added the comment: The following code at the beginning of PyUnicode_DecodeUTF32Stateful is buggy when codec endianness doesn't match the native endianness (not to mention it could also crash if the underlying CPU arch doesn't support unaligned access to 4-byte integers):
#ifndef Py_UNICODE_WIDE for (i = pairs = 0; i < size/4; i++) if (((Py_UCS4 *)s)[i] >= 0x10000) pairs++; #endif As a result, the preallocated unicode object isn't long enough and Python writes into memory it shouldn't write into. It can produce hard crashes, such as: >>> l = unicode(b'\x00\x01\x00\x00' * 1024, 'utf-32be') Debug memory block at address p=0xf2b310: 2050 bytes originally requested The 8 pad bytes at p-8 are FORBIDDENBYTE, as expected. The 8 pad bytes at tail=0xf2bb12 are not all FORBIDDENBYTE (0xfb): at tail+0: 0x00 *** OUCH at tail+1: 0xdc *** OUCH at tail+2: 0x00 *** OUCH at tail+3: 0xd8 *** OUCH at tail+4: 0x00 *** OUCH at tail+5: 0xdc *** OUCH at tail+6: 0x00 *** OUCH at tail+7: 0xd8 *** OUCH The block was made by call #61925422603698392 to debug malloc/realloc. Data at p: 00 d8 00 dc 00 d8 00 dc ... 00 dc 00 d8 00 dc 00 d8 Fatal Python error: bad trailing pad byte Abandon ---------- priority: high -> critical type: behavior -> crash _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8941> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com