[issue9241] SAXParseError on unicode (Japanese) file
New submission from Gianfranco gianz...@tin.it: When parsing a UTF-16 little-endian encoded XML file containing some japanese characters, the xml.sax.parse function raises a SAXParseException exception saying no element found. Problem arises with/on: Python 2.5.2/Windows XP Pro SP3 32 bit Python 2.6.4/Windows XP Pro SP3 32 bit Python 2.5.2/Windows 2008 Server SP2 64 bit The same file is successfully processed with/on: Python 2.4.3/CentOS 5.4 Python 2.6.3/CentOS 5.4 I've attached a minimal XML file that contains a single U+FF1A japanese character that triggers the exception. Code for parsing the file follows: import xml.sax xml.sax.parse(open(ff1a.xml), xml.sax.ContentHandler()) Best regards, Gianfranco -- components: XML files: ff1a.xml messages: 110163 nosy: gianzula priority: normal severity: normal status: open title: SAXParseError on unicode (Japanese) file type: behavior versions: Python 2.5 Added file: http://bugs.python.org/file17979/ff1a.xml ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9241 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9241] SAXParseError on unicode (Japanese) file
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Your file contains the byte \x1a == EOF. You should not open it in text mode, but in binary mode, otherwise it's truncated. import xml.sax xml.sax.parse(open(ff1a.xml, 'rb'), xml.sax.ContentHandler()) works on all versions I tried. -- nosy: +amaury.forgeotdarc resolution: - invalid status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9241 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com