The following Python script causes Python 2.3, 2.4 and the latest CVS to crash with a Segmentation Fault:
import xml.dom.minidom x = u'<?xml version="1.0"?>\n<fran\xe7ais>Comment \xe7a va ? Tr\xe8s bien ?</fran\xe7ais>' dom = xml.dom.minidom.parseString( x.encode( 'latin_1' ) ) print repr( dom.childNodes[0].localName ) The problem is that this XML document does not specify an encoding. In this case, minidom assumes that it is encoded in UTF-8. However, in fact it is encoded in Latin-1. My two line patch, in the SourceForge tracker at the URL below, causes this to raise a UnicodeDecodingError instead. http://sourceforge.net/tracker/index.php? func=detail&aid=1309009&group_id=5470&atid=305470 Any chance that someone wants to commit this tiny two line fix? This might be the kind of fix that might be elegible to be backported to Python 2.4 as well. It passes "make test" on both my Linux system and my Mac. I've also attached a patch that adds this test case to test_minidom.py. Thanks, Evan Jones -- Evan Jones http://evanjones.ca/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com