[issue9241] SAXParseError on unicode (Japanese) file

2010-07-13 Thread Gianfranco

New submission from Gianfranco gianz...@tin.it:

When parsing a UTF-16 little-endian encoded XML file containing some japanese 
characters, the xml.sax.parse function raises a SAXParseException exception 
saying no element found. Problem arises with/on:

Python 2.5.2/Windows XP Pro SP3 32 bit
Python 2.6.4/Windows XP Pro SP3 32 bit
Python 2.5.2/Windows 2008 Server SP2 64 bit

The same file is successfully processed with/on:

Python 2.4.3/CentOS 5.4
Python 2.6.3/CentOS 5.4

I've attached a minimal XML file that contains a single U+FF1A japanese 
character that triggers the exception. Code for parsing the file follows:

import xml.sax
xml.sax.parse(open(ff1a.xml), xml.sax.ContentHandler())

Best regards,
Gianfranco

--
components: XML
files: ff1a.xml
messages: 110163
nosy: gianzula
priority: normal
severity: normal
status: open
title: SAXParseError on unicode (Japanese) file
type: behavior
versions: Python 2.5
Added file: http://bugs.python.org/file17979/ff1a.xml

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9241
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9241] SAXParseError on unicode (Japanese) file

2010-07-13 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

Your file contains the byte \x1a == EOF.
You should not open it in text mode, but in binary mode, otherwise it's 
truncated.

import xml.sax
xml.sax.parse(open(ff1a.xml, 'rb'), xml.sax.ContentHandler())

works on all versions I tried.

--
nosy: +amaury.forgeotdarc
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9241
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com