[issue11159] Sax parser crashes if given unicode file name
Roundup Robot added the comment: New changeset d3e7aea8a550 by Serhiy Storchaka in branch '2.7': Issue #11159: SAX parser now supports unicode file names. http://hg.python.org/cpython/rev/d3e7aea8a550 New changeset d2622ca8493a by Serhiy Storchaka in branch '3.2': Issue #11159: Add tests for testing SAX parser support of non-ascii file names. http://hg.python.org/cpython/rev/d2622ca8493a New changeset b85ba45b9579 by Serhiy Storchaka in branch '3.3': Issue #11159: Add tests for testing SAX parser support of non-ascii file names. http://hg.python.org/cpython/rev/b85ba45b9579 New changeset 107a06f1a542 by Serhiy Storchaka in branch 'default': Issue #11159: Add tests for testing SAX parser support of non-ascii file names. http://hg.python.org/cpython/rev/107a06f1a542 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Serhiy Storchaka added the comment: Fixed. Thank you for the report. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Roundup Robot added the comment: New changeset 706218e0facb by Serhiy Storchaka in branch '2.7': Fix tests for issue #11159. http://hg.python.org/cpython/rev/706218e0facb New changeset a7c074d9cbfb by Serhiy Storchaka in branch '3.2': Fix tests for issue #11159. http://hg.python.org/cpython/rev/a7c074d9cbfb New changeset 2bf01f03ff40 by Serhiy Storchaka in branch '3.3': Fix tests for issue #11159. http://hg.python.org/cpython/rev/2bf01f03ff40 New changeset 4ab386b00aaf by Serhiy Storchaka in branch 'default': Fix tests for issue #11159. http://hg.python.org/cpython/rev/4ab386b00aaf -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Serhiy Storchaka added the comment: Yes, this thing was doubted me too. I proceeded from the following considerations. 1. Often system id is used for file operations and in this case you need to use the file system encoding. Unfortunately Python 2 does not have 'surrogateescape' handler which would allow to encode arbitrary name and then restore and re-encode it for file operations. 2. Python 2 in contrary to Python 3 accepts bytes and they may not be valid UTF-8. We have to choose between compatibility with Python 2 and Python 3. I chose the first, because it is more important for bugfix. May be I am wrong. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Serhiy Storchaka added the comment: Here is an alternative patch. It doesn't encode system id when it settled, instead system id attribute can be bytes or an unicode and encoding/decoding happened only a file opened. -- Added file: http://bugs.python.org/file28722/sax_unicode_fn_alt-2.7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file28268/sax_unicode_fn-2.7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file28268/sax_unicode_fn-2.7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Serhiy Storchaka added the comment: Ported tests for nonascii System-Id on 3.x. If no one objects I'll commit this next week. -- Added file: http://bugs.python.org/file28714/sax_unicode_fn-3.x.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Christian Heimes added the comment: I don't think that the file system encoding is the correct answer here. AFAIR expat uses UTF-8 encoded strings. Python 3.x uses PyArg_ParseTupleAndKeywords() with s which converts PyUnicode to PyBytes with the utf-8 codec. -- nosy: +christian.heimes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Sergey Prokhorov sergey.prokho...@gmail.com: -- nosy: +Sergey.Prokhorov ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Serhiy Storchaka added the comment: However Python doesn't work with bytes filenames (I don't think this is a bug). The proposed patch allows unicode filenames be used in SAX parser. -- keywords: +patch nosy: +serhiy.storchaka stage: - patch review Added file: http://bugs.python.org/file28268/sax_unicode_fn-2.7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Carsten Grohmann carstengrohm...@gmx.de: -- nosy: +cgrohmann ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
Changes by Daniel Urban urban.dani...@gmail.com: -- type: crash - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
John Chandler therealmetal...@gmail.com added the comment: Confirmed about not being an issue in Python 3. Just checked with Python 3.3.0a0 and the example works fine - no exception raised. -- nosy: +John.Chandler ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11159] Sax parser crashes if given unicode file name
New submission from Rickard Lindberg ricl...@gmail.com: The error is the following: Traceback (most recent call last): File stdin, line 4, in module File /usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py, line 31, in parse parser.parse(filename_or_stream) File /usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py, line 109, in parse xmlreader.IncrementalParser.parse(self, source) File /usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py, line 119, in parse self.prepareParser(source) File /usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py, line 121, in prepareParser self._parser.SetBase(source.getSystemId()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0: ordinal not in range(128) The following bash script can be used to reproduce the error: #!/bin/sh cat å.timeline EOF ?xml version=1.0 encoding=utf-8? timeline version0.13.0devb38ace0a572b+/version categories /categories events event start2011-02-01 00:00:00/start end2011-02-03 08:46:00/end textasdsd/text /event /events view displayed_period start2011-01-24 16:38:11/start end2011-02-23 16:38:11/end /displayed_period hidden_categories /hidden_categories /view /timeline EOF python EOF # encoding: utf-8 from xml.sax import parse from xml.sax.handler import ContentHandler parse(open(uå.timeline, 'r'), ContentHandler()) EOF If I instead do this, it works fine: parse(uå.timeline.encode(utf-8), ContentHandler()) Also: sys.getfilesystemencoding() 'UTF-8' I heard from another user that this was not a problem with Python 3.1.2. -- components: XML messages: 128212 nosy: ricli85 priority: normal severity: normal status: open title: Sax parser crashes if given unicode file name type: crash versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11159 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com