Patches item #1309009, was opened at 2005-09-29 15:46
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1309009&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Modules
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Evan Jones (vulturex)
Assigned to: Nobody/Anonymous (nobody)
Summary: pyexpat.c: Two line fix for decoding crash
Initial Comment:
The attached Python script "test.py" will crash Python version 2.3,
2.4 and current CVS. The problem is that expat could pass back a
string that is not in UTF8 format when the character encoding is not
specified. In the example "test.py" the XML document is in latin_1
format, but Python thinks it is in UTF-8 format.
The workaround is to decode the string into Unicode first, then
encode it as UTF8. However, if this data was coming from a file or a
user, it could crash the interpreter.
With the attached patch, instead of causing a segmentation fault it
raises an exception, which is exactly what Python 2.2 does in this
case:
Traceback (most recent call last):
File "/home/ejones/test.py", line 5, in ?
dom = xml.dom.minidom.parseString( x.encode( 'latin_1' ) )
File "/home/ejones/python/dist/src/Lib/xml/dom/minidom.py", line
1925, in parseString
return expatbuilder.parseString(string)
File "/home/ejones/python/dist/src/Lib/xml/dom/expatbuilder.py",
line 940, in parseString
return builder.parseString(string)
File "/home/ejones/python/dist/src/Lib/xml/dom/expatbuilder.py",
line 223, in parseString
parser.Parse(string, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6:
invalid data
----------------------------------------------------------------------
>Comment By: Neal Norwitz (nnorwitz)
Date: 2005-09-29 21:58
Message:
Logged In: YES
user_id=33168
Thanks!
Note, I had to modify the patch a little bit because the
result of string_intern() was passed to Py_BuildValue().
Since string_intern() returned NULL, Py_BuildValue()
returned NULL and container wound up being Py_DECREF()ed
twice which showed up in a debug build.
Checking in Lib/test/test_minidom.py 1.42, 1.39.4.3
Checking in Misc/ACKS 1.297, 1.289.2.4
Checking in Misc/NEWS 1.1381, 1.1193.2.115
Checking in Modules/pyexpat.c 2.91, 2.89.2.2
----------------------------------------------------------------------
Comment By: Evan Jones (vulturex)
Date: 2005-09-29 17:25
Message:
Logged In: YES
user_id=539295
I've also attached a patch which adds this as a test case to
test_minidom.py.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1309009&group_id=5470
_______________________________________________
Patches mailing list
[email protected]
http://mail.python.org/mailman/listinfo/patches