Public bug reported: hi,
i discovered a bug yesterday in repr() for unicode strings. this causes an unpatched non-debug wide (UTF-32/UCS-4) build of python to abort: python2.4 -c 'assert(repr(u"\U00010000" * 39 + u"\uffff" * 4096)) == (repr(u"\U00010000" * 39 + u"\uffff" * 4096))' the problem is fixed by a change to unicodeobject.c. in the process of fixing it i also found and fixed another bug in repr() on UCS-4 python builds -- previously paired unicode surrogates were being repr()'ed as a single "character" even though they are not treated as such by a UCS-4 python build -- i.e. eval(repr(u'\ud800\udc00')) != u'\ud800\udc00' in an unpatched UCS-4 build. Package: python2.4 Version: 2.4.3-7ubuntu2 Severity: important when i run this command: python -c "repr(u'\u24ea\u059c\u200a\U0001d77e\uff07\u202f\u0747\u202f \U0001d56b\U0001d5b9\U0001d4e9\u20052\u14bf\U0001d7f8\u200a\U0001d795 \U0001d6e7Z\u2006\u2002\U0001d50a\uff27\u13c0\u2000\uff16\u0411\uff16 \U0001d7e7\uff4c\u2006\u2001\ufe39\u2008\u0313]\u2008\u3014\u3015')" python aborts with the following backtrace and memory dump: *** glibc detected *** python: realloc(): invalid next size: 0x081521e8 *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0xb7e8acd4] /lib/tls/i686/cmov/libc.so.6(__libc_realloc+0xff)[0xb7e8cc5f] python(_PyString_Resize+0x80)[0x8082b4b] python[0x80991f7] python(PyObject_Repr+0x58)[0x807d1fd] python(PyEval_EvalFrame+0x4b37)[0x80b5270] python(PyEval_EvalCodeEx+0x836)[0x80b65d6] python(PyEval_EvalCode+0x57)[0x80b6640] python(PyRun_SimpleStringFlags+0xa8)[0x80d8b7c] python(Py_Main+0x685)[0x8055862] python(main+0x22)[0x80550e2] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xd8)[0xb7e378b8] python[0x8055041] ======= Memory map: ======== 08048000-0811a000 r-xp 00000000 08:03 622736 /usr/bin/python2.4 0811a000-0813b000 rw-p 000d1000 08:03 622736 /usr/bin/python2.4 0813b000-081b5000 rw-p 0813b000 00:00 0 [heap] b7c00000-b7c21000 rw-p b7c00000 00:00 0 b7c21000-b7d00000 ---p b7c21000 00:00 0 b7d40000-b7d4a000 r-xp 00000000 08:03 376899 /lib/libgcc_s.so.1 b7d4a000-b7d4b000 rw-p 00009000 08:03 376899 /lib/libgcc_s.so.1 b7d68000-b7d9b000 r--p 00000000 08:03 82634 /usr/lib/locale/en_US.utf8/LC_CTYPE b7d9b000-b7d9e000 r-xp 00000000 08:03 625529 /usr/lib/python2.4/lib-dynload/_locale.so b7d9e000-b7d9f000 rw-p 00003000 08:03 625529 /usr/lib/python2.4/lib-dynload/_locale.so b7d9f000-b7e22000 rw-p b7d9f000 00:00 0 b7e22000-b7f51000 r-xp 00000000 08:03 66543 /lib/tls/i686/cmov/libc-2.4.so b7f51000-b7f53000 r--p 0012e000 08:03 66543 /lib/tls/i686/cmov/libc-2.4.so b7f53000-b7f55000 rw-p 00130000 08:03 66543 /lib/tls/i686/cmov/libc-2.4.so b7f55000-b7f58000 rw-p b7f55000 00:00 0 b7f58000-b7f7c000 r-xp 00000000 08:03 66547 /lib/tls/i686/cmov/libm-2.4.so b7f7c000-b7f7e000 rw-p 00023000 08:03 66547 /lib/tls/i686/cmov/libm-2.4.so b7f7e000-b7f80000 r-xp 00000000 08:03 68161 /lib/tls/i686/cmov/libutil-2.4.so b7f80000-b7f82000 rw-p 00001000 08:03 68161 /lib/tls/i686/cmov/libutil-2.4.so b7f82000-b7f83000 rw-p b7f82000 00:00 0 b7f83000-b7f85000 r-xp 00000000 08:03 66546 /lib/tls/i686/cmov/libdl-2.4.so b7f85000-b7f87000 rw-p 00001000 08:03 66546 /lib/tls/i686/cmov/libdl-2.4.so b7f87000-b7f96000 r-xp 00000000 08:03 68156 /lib/tls/i686/cmov/libpthread-2.4.so b7f96000-b7f98000 rw-p 0000f000 08:03 68156 /lib/tls/i686/cmov/libpthread-2.4.so b7f98000-b7f9a000 rw-p b7f98000 00:00 0 b7fb0000-b7fb7000 r--s 00000000 08:03 2130015 /usr/lib/gconv/gconv-modules.cache b7fb7000-b7fb9000 rw-p b7fb7000 00:00 0 b7fb9000-b7fd2000 r-xp 00000000 08:03 2737127 /lib/ld-2.4.so b7fd2000-b7fd4000 rw-p 00018000 08:03 2737127 /lib/ld-2.4.so bf99b000-bf9b3000 rw-p bf99b000 00:00 0 [stack] ffffe000-fffff000 ---p 00000000 00:00 0 [vdso] Aborted -- System Information: Debian Release: testing/unstable APT prefers edgy-updates APT policy: (500, 'edgy-updates'), (500, 'edgy-security'), (500, 'edgy-backports'), (500, 'edgy') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/dash Kernel: Linux 2.6.17-5-386 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Versions of packages python2.4 depends on: ii libbz2-1.0 1.0.3-3 high-quality block-sorting file co ii libc6 2.4-1ubuntu8 GNU C Library: Shared libraries ii libdb4.4 4.4.20-6 Berkeley v4.4 Database Libraries [ ii libncurses5 5.5-2ubuntu1 Shared libraries for terminal hand ii libncursesw5 5.5-2ubuntu1 Shared libraries for terminal hand ii libreadline5 5.1-7build1 GNU readline and history libraries ii libssl0.9.8 0.9.8b-2build1 SSL shared libraries ii mime-support 3.36-1 MIME files 'mime.types' & 'mailcap ii python2.4-minimal 2.4.3-7ubuntu2 A minimal subset of the Python lan python2.4 recommends no packages. -- no debconf information the patch is online here: http://zoehep.xent.com/~bsittler/python2.4-2.4.3_unicodeobject.c.diff and also inlined here and attached to this message: --- Objects/unicodeobject.c 2006-03-27 23:32:36.000000000 -0800 +++ /home/bsittler/pkgs/python2.4-2.4.3_unicodeobject.c.patched 2006-08-16 12:37:19.000000000 -0700 @@ -1968,7 +1968,29 @@ static const char *hexdigit = "0123456789abcdef"; - repr = PyString_FromStringAndSize(NULL, 2 + 6*size + 1); + /* Initial allocation is based on the longest-possible unichr + escape. + + In wide (UTF-32) builds '\U00xxxxxx' is 10 chars per source + unichr, so in this case it's the longest unichr escape. In + narrow (UTF-16) builds this is five chars per source unichr + since there are two unichrs in the surrogate pair, so in narrow + (UTF-16) builds it's not the longest unichr escape. + + In wide or narrow builds '\uxxxx' is 6 chars per source unichr, + so in the narrow (UTF-16) build case it's the longest unichr + escape. + + */ + + repr = PyString_FromStringAndSize(NULL, + 2 +#ifdef Py_UNICODE_WIDE + + 10*size +#else + + 6*size +#endif + + 1); if (repr == NULL) return NULL; @@ -1993,15 +2015,6 @@ #ifdef Py_UNICODE_WIDE /* Map 21-bit characters to '\U00xxxxxx' */ else if (ch >= 0x10000) { - int offset = p - PyString_AS_STRING(repr); - - /* Resize the string if necessary */ - if (offset + 12 > PyString_GET_SIZE(repr)) { - if (_PyString_Resize(&repr, PyString_GET_SIZE(repr) + 100)) - return NULL; - p = PyString_AS_STRING(repr) + offset; - } - *p++ = '\\'; *p++ = 'U'; *p++ = hexdigit[(ch >> 28) & 0x0000000F]; @@ -2014,8 +2027,8 @@ *p++ = hexdigit[ch & 0x0000000F]; continue; } -#endif - /* Map UTF-16 surrogate pairs to Unicode \UXXXXXXXX escapes */ +#else + /* Map UTF-16 surrogate pairs to '\U00xxxxxx' */ else if (ch >= 0xD800 && ch < 0xDC00) { Py_UNICODE ch2; Py_UCS4 ucs; @@ -2040,6 +2053,7 @@ s--; size++; } +#endif /* Map 16-bit characters to '\uxxxx' */ if (ch >= 256) { ** Affects: python2.4 (Ubuntu) Importance: Untriaged Status: Unconfirmed ** Visibility changed to: Public -- buffer overrun in repr() for unicode strings https://launchpad.net/bugs/56633 -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
