Patches item #1309009, was opened at 2005-09-29 15:46
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1309009&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Modules
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Evan Jones (vulturex)
Assigned to: Nobody/Anonymous (nobody)
Summary: pyexpat.c: Two line fix for decoding crash

Initial Comment:
The attached Python script "test.py" will crash Python version 2.3, 
2.4 and current CVS. The problem is that expat could pass back a 
string that is not in UTF8 format when the character encoding is not 
specified. In the example "test.py" the XML document is in latin_1 
format, but Python thinks it is in UTF-8 format.

The workaround is to decode the string into Unicode first, then 
encode it as UTF8. However, if this data was coming from a file or a 
user, it could crash the interpreter.

With the attached patch, instead of causing a segmentation fault it 
raises an exception, which is exactly what Python 2.2 does in this 
case:

Traceback (most recent call last):
  File "/home/ejones/test.py", line 5, in ?
    dom = xml.dom.minidom.parseString( x.encode( 'latin_1' ) )
  File "/home/ejones/python/dist/src/Lib/xml/dom/minidom.py", line 
1925, in parseString
    return expatbuilder.parseString(string)
  File "/home/ejones/python/dist/src/Lib/xml/dom/expatbuilder.py", 
line 940, in parseString
    return builder.parseString(string)
  File "/home/ejones/python/dist/src/Lib/xml/dom/expatbuilder.py", 
line 223, in parseString
    parser.Parse(string, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6: 
invalid data


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2005-09-29 21:58

Message:
Logged In: YES 
user_id=33168

Thanks!  

Note, I had to modify the patch a little bit because the
result of string_intern() was passed to Py_BuildValue(). 
Since string_intern() returned NULL, Py_BuildValue()
returned NULL and container wound up being Py_DECREF()ed
twice which showed up in a debug build.

Checking in Lib/test/test_minidom.py 1.42, 1.39.4.3
Checking in Misc/ACKS 1.297, 1.289.2.4
Checking in Misc/NEWS 1.1381, 1.1193.2.115
Checking in Modules/pyexpat.c 2.91, 2.89.2.2



----------------------------------------------------------------------

Comment By: Evan Jones (vulturex)
Date: 2005-09-29 17:25

Message:
Logged In: YES 
user_id=539295

I've also attached a patch which adds this as a test case to 
test_minidom.py.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1309009&group_id=5470
_______________________________________________
Patches mailing list
[email protected]
http://mail.python.org/mailman/listinfo/patches

Reply via email to