Public bug reported:

hi,

i discovered a bug yesterday in repr() for unicode strings. this
causes an unpatched non-debug wide (UTF-32/UCS-4) build of python to
abort:

python2.4 -c 'assert(repr(u"\U00010000" * 39 + u"\uffff" * 4096)) ==
(repr(u"\U00010000" * 39 + u"\uffff" * 4096))'

the problem is fixed by a change to unicodeobject.c. in the process of
fixing it i also found and fixed another bug in repr() on UCS-4 python
builds -- previously paired unicode surrogates were being repr()'ed as a
single "character" even though they are not treated as such by a UCS-4
python build -- i.e. eval(repr(u'\ud800\udc00')) != u'\ud800\udc00' in
an unpatched UCS-4 build.


Package: python2.4
Version: 2.4.3-7ubuntu2
Severity: important


when i run this command:

python -c "repr(u'\u24ea\u059c\u200a\U0001d77e\uff07\u202f\u0747\u202f
\U0001d56b\U0001d5b9\U0001d4e9\u20052\u14bf\U0001d7f8\u200a\U0001d795
\U0001d6e7Z\u2006\u2002\U0001d50a\uff27\u13c0\u2000\uff16\u0411\uff16
\U0001d7e7\uff4c\u2006\u2001\ufe39\u2008\u0313]\u2008\u3014\u3015')"

python aborts with the following backtrace and memory dump:

*** glibc detected *** python: realloc(): invalid next size: 0x081521e8
***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7e8acd4]
/lib/tls/i686/cmov/libc.so.6(__libc_realloc+0xff)[0xb7e8cc5f]
python(_PyString_Resize+0x80)[0x8082b4b]
python[0x80991f7]
python(PyObject_Repr+0x58)[0x807d1fd]
python(PyEval_EvalFrame+0x4b37)[0x80b5270]
python(PyEval_EvalCodeEx+0x836)[0x80b65d6]
python(PyEval_EvalCode+0x57)[0x80b6640]
python(PyRun_SimpleStringFlags+0xa8)[0x80d8b7c]
python(Py_Main+0x685)[0x8055862]
python(main+0x22)[0x80550e2]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xd8)[0xb7e378b8]
python[0x8055041]
======= Memory map: ========
08048000-0811a000 r-xp 00000000 08:03 622736     /usr/bin/python2.4
0811a000-0813b000 rw-p 000d1000 08:03 622736     /usr/bin/python2.4
0813b000-081b5000 rw-p 0813b000 00:00 0          [heap]
b7c00000-b7c21000 rw-p b7c00000 00:00 0
b7c21000-b7d00000 ---p b7c21000 00:00 0
b7d40000-b7d4a000 r-xp 00000000 08:03 376899     /lib/libgcc_s.so.1
b7d4a000-b7d4b000 rw-p 00009000 08:03 376899     /lib/libgcc_s.so.1
b7d68000-b7d9b000 r--p 00000000 08:03
82634      /usr/lib/locale/en_US.utf8/LC_CTYPE
b7d9b000-b7d9e000 r-xp 00000000 08:03
625529     /usr/lib/python2.4/lib-dynload/_locale.so
b7d9e000-b7d9f000 rw-p 00003000 08:03
625529     /usr/lib/python2.4/lib-dynload/_locale.so
b7d9f000-b7e22000 rw-p b7d9f000 00:00 0
b7e22000-b7f51000 r-xp 00000000 08:03
66543      /lib/tls/i686/cmov/libc-2.4.so
b7f51000-b7f53000 r--p 0012e000 08:03
66543      /lib/tls/i686/cmov/libc-2.4.so
b7f53000-b7f55000 rw-p 00130000 08:03
66543      /lib/tls/i686/cmov/libc-2.4.so
b7f55000-b7f58000 rw-p b7f55000 00:00 0
b7f58000-b7f7c000 r-xp 00000000 08:03
66547      /lib/tls/i686/cmov/libm-2.4.so
b7f7c000-b7f7e000 rw-p 00023000 08:03
66547      /lib/tls/i686/cmov/libm-2.4.so
b7f7e000-b7f80000 r-xp 00000000 08:03
68161      /lib/tls/i686/cmov/libutil-2.4.so
b7f80000-b7f82000 rw-p 00001000 08:03
68161      /lib/tls/i686/cmov/libutil-2.4.so
b7f82000-b7f83000 rw-p b7f82000 00:00 0
b7f83000-b7f85000 r-xp 00000000 08:03
66546      /lib/tls/i686/cmov/libdl-2.4.so
b7f85000-b7f87000 rw-p 00001000 08:03
66546      /lib/tls/i686/cmov/libdl-2.4.so
b7f87000-b7f96000 r-xp 00000000 08:03
68156      /lib/tls/i686/cmov/libpthread-2.4.so
b7f96000-b7f98000 rw-p 0000f000 08:03
68156      /lib/tls/i686/cmov/libpthread-2.4.so
b7f98000-b7f9a000 rw-p b7f98000 00:00 0
b7fb0000-b7fb7000 r--s 00000000 08:03
2130015    /usr/lib/gconv/gconv-modules.cache
b7fb7000-b7fb9000 rw-p b7fb7000 00:00 0
b7fb9000-b7fd2000 r-xp 00000000 08:03 2737127    /lib/ld-2.4.so
b7fd2000-b7fd4000 rw-p 00018000 08:03 2737127    /lib/ld-2.4.so
bf99b000-bf9b3000 rw-p bf99b000 00:00 0          [stack]
ffffe000-fffff000 ---p 00000000 00:00 0          [vdso]
Aborted

-- System Information:
Debian Release: testing/unstable
  APT prefers edgy-updates
  APT policy: (500, 'edgy-updates'), (500, 'edgy-security'), (500,
'edgy-backports'), (500, 'edgy')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/dash
Kernel: Linux 2.6.17-5-386
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages python2.4 depends on:
ii  libbz2-1.0                1.0.3-3        high-quality block-sorting
file co
ii  libc6                     2.4-1ubuntu8   GNU C Library: Shared
libraries
ii  libdb4.4                  4.4.20-6       Berkeley v4.4 Database
Libraries [
ii  libncurses5               5.5-2ubuntu1   Shared libraries for
terminal hand
ii  libncursesw5              5.5-2ubuntu1   Shared libraries for
terminal hand
ii  libreadline5              5.1-7build1    GNU readline and history
libraries
ii  libssl0.9.8               0.9.8b-2build1 SSL shared libraries
ii  mime-support              3.36-1         MIME files 'mime.types' &
'mailcap
ii  python2.4-minimal         2.4.3-7ubuntu2 A minimal subset of the
Python lan

python2.4 recommends no packages.

-- no debconf information

the patch is online here:

http://zoehep.xent.com/~bsittler/python2.4-2.4.3_unicodeobject.c.diff

and also inlined here and attached to this message:

--- Objects/unicodeobject.c     2006-03-27 23:32:36.000000000 -0800
+++ /home/bsittler/pkgs/python2.4-2.4.3_unicodeobject.c.patched
2006-08-16 12:37:19.000000000 -0700
@@ -1968,7 +1968,29 @@
 
     static const char *hexdigit = "0123456789abcdef";
 
-    repr = PyString_FromStringAndSize(NULL, 2 + 6*size + 1);
+    /* Initial allocation is based on the longest-possible unichr
+       escape.
+
+       In wide (UTF-32) builds '\U00xxxxxx' is 10 chars per source
+       unichr, so in this case it's the longest unichr escape. In
+       narrow (UTF-16) builds this is five chars per source unichr
+       since there are two unichrs in the surrogate pair, so in narrow
+       (UTF-16) builds it's not the longest unichr escape.
+
+       In wide or narrow builds '\uxxxx' is 6 chars per source unichr,
+       so in the narrow (UTF-16) build case it's the longest unichr
+       escape.
+
+    */
+
+    repr = PyString_FromStringAndSize(NULL,
+        2
+#ifdef Py_UNICODE_WIDE
+        + 10*size
+#else
+        + 6*size
+#endif
+        + 1);
     if (repr == NULL)
         return NULL;
 
@@ -1993,15 +2015,6 @@
 #ifdef Py_UNICODE_WIDE
         /* Map 21-bit characters to '\U00xxxxxx' */
         else if (ch >= 0x10000) {
-           int offset = p - PyString_AS_STRING(repr);
-
-           /* Resize the string if necessary */
-           if (offset + 12 > PyString_GET_SIZE(repr)) {
-               if (_PyString_Resize(&repr, PyString_GET_SIZE(repr) + 100))
-                   return NULL;
-               p = PyString_AS_STRING(repr) + offset;
-           }
-
             *p++ = '\\';
             *p++ = 'U';
             *p++ = hexdigit[(ch >> 28) & 0x0000000F];
@@ -2014,8 +2027,8 @@
             *p++ = hexdigit[ch & 0x0000000F];
            continue;
         }
-#endif
-       /* Map UTF-16 surrogate pairs to Unicode \UXXXXXXXX escapes */
+#else
+       /* Map UTF-16 surrogate pairs to '\U00xxxxxx' */
        else if (ch >= 0xD800 && ch < 0xDC00) {
            Py_UNICODE ch2;
            Py_UCS4 ucs;
@@ -2040,6 +2053,7 @@
            s--;
            size++;
        }
+#endif
 
         /* Map 16-bit characters to '\uxxxx' */
         if (ch >= 256) {

** Affects: python2.4 (Ubuntu)
     Importance: Untriaged
         Status: Unconfirmed

** Visibility changed to: Public

-- 
buffer overrun in repr() for unicode strings
https://launchpad.net/bugs/56633

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to