New submission from Matt Giuca <matt.gi...@gmail.com>:

In unicodeobject.c's unicodeescape_string, in UCS2 builds, if the last 
character of the string is the start of a UTF-16 surrogate pair (between 
'\ud800' and '\udfff'), there is a slight overrun problem. For example:

>>> repr(u'abcd\ud800')

Upon reading ch = 0xd800, the test (ch >= 0xD800 && ch < 0xDC00) succeeds, and 
it then reads ch2 = *s++. Note that preceding this line, s points at one 
character past the end of the string, so the value read will be garbage. I 
imagine that unless it falls on a segment boundary, the worst that could happen 
is the character '\ud800' is interpreted as some other wide character. 
Nevertheless, this is bad.

Note that *technically* this is never bad, because _PyUnicode_New allocates an 
extra character and sets it to '\u0000', and thus the above example will always 
set ch2 to 0, and it will behave correctly. But this is a tenuous thing to rely 
on, especially given the comment above _PyUnicode_New:

/* We allocate one more byte to make sure the string is
   Ux0000 terminated -- XXX is this needed ?
*/

I thought about removing that XXX, but I'd rather fix the problem. Therefore, I 
have attached a patch which does a range check before reading ch2:

--- Objects/unicodeobject.c     (revision 81539)
+++ Objects/unicodeobject.c     (working copy)
@@ -3065,7 +3065,7 @@
         }
 #else
         /* Map UTF-16 surrogate pairs to '\U00xxxxxx' */
-        else if (ch >= 0xD800 && ch < 0xDC00) {
+        else if (ch >= 0xD800 && ch < 0xDC00 && size > 0) {
             Py_UNICODE ch2;
             Py_UCS4 ucs;

Also affects Python 3.

----------
components: Unicode
files: unicode-range-check.patch
keywords: patch
messages: 106506
nosy: mgiuca
priority: normal
severity: normal
status: open
title: Range check on unicode repr
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3
Added file: http://bugs.python.org/file17465/unicode-range-check.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8821>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to