[issue17915] Encoding error with sax and codecs
Roundup Robot added the comment: New changeset 1c01571ce0f4 by Georg Brandl in branch '3.2': Issue #17915: Fix interoperability of xml.sax with file objects returned by http://hg.python.org/cpython/rev/1c01571ce0f4 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Georg Brandl added the comment: Fixed in 3.2, 3.3 and default. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Simon Conseil added the comment: thanks everybody ! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Serhiy Storchaka added the comment: It is not working fine on Python 3.3.0. with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f: ... xml = XMLGenerator(f, encoding='iso-8859-1') ... xml.startDocument() ... xml.startElement('root', {'attr': u'\u20ac'}) ... xml.endElement('root') ... xml.endDocument() ... Traceback (most recent call last): File stdin, line 4, in module File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 141, in startElement self._write(' %s=%s' % (name, quoteattr(value))) File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 96, in _write self._out.write(text) File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 699, in write return self.writer.write(data) File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 355, in write data, consumed = self.encode(object, self.errors) UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256) And shouldn't. On Python 2 XMLGenerator works only with binary files and works with text files only due implicit str-unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text. Accepting of text streams in XMLGenerator should be deprecated in future versions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
STINNER Victor added the comment: Accepting of text streams in XMLGenerator should be deprecated in future versions. I agree that the following pattern is strange: with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f: xml = XMLGenerator(f, encoding='iso-8859-1') Why would I specify a codec twice? What happens if I specify two different codecs? with codecs.open('/tmp/test.txt', 'w', encoding='utf-8') as f: xml = XMLGenerator(f, encoding='iso-8859-1') It may be simpler (and safer?) to reject text files. If you cannot detect that f is a text file, just make it explicit in the documentation that f must be a binary file. 2013/5/7 Serhiy Storchaka rep...@bugs.python.org: Serhiy Storchaka added the comment: It is not working fine on Python 3.3.0. with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f: ... xml = XMLGenerator(f, encoding='iso-8859-1') ... xml.startDocument() ... xml.startElement('root', {'attr': u'\u20ac'}) ... xml.endElement('root') ... xml.endDocument() ... Traceback (most recent call last): File stdin, line 4, in module File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 141, in startElement self._write(' %s=%s' % (name, quoteattr(value))) File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 96, in _write self._out.write(text) File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 699, in write return self.writer.write(data) File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 355, in write data, consumed = self.encode(object, self.errors) UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256) And shouldn't. On Python 2 XMLGenerator works only with binary files and works with text files only due implicit str-unicode converting. On Python 3 working with binary files was broken. Issue1470548 restores working with binary file (for which only XMLGenerator can work correctly), but for backward compatibility accepting of text files was left. The problem is that there no trustworthy method to determine whenever a file-like object is binary or text. Accepting of text streams in XMLGenerator should be deprecated in future versions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Serhiy Storchaka added the comment: Here is a patch which adds explicit checks for codecs stream writers and adds tests for these cases. The tests are not entirely honest, they test only that XMLGenerator works with some specially prepared streams. XMLGenerator doesn't work with a stream with arbitrary encoding and errors handler. -- keywords: +patch Added file: http://bugs.python.org/file30164/XMLGenerator_codecs_stream.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Serhiy Storchaka added the comment: Of course, if this patch will be committed, perhaps it will be worth to apply it also for 3.2 which has the same regression. -- components: +XML stage: needs patch - patch review versions: +Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Serhiy Storchaka added the comment: Perhaps we should add a deprecation warning for codecs streams right in this patch? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
New submission from Simon Conseil: There is an encoding issue between codecs.open and sax (see attached file). The issue is reproducible on Python 3.3.1, it is working fine on Python 3.3.0 -- components: Library (Lib) files: report.txt messages: 188508 nosy: sconseil priority: normal severity: normal status: open title: Encoding error with sax and codecs versions: Python 3.3 Added file: http://bugs.python.org/file30146/report.txt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo, serhiy.storchaka type: - behavior versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
Antoine Pitrou added the comment: Since this is a regression, setting (temporarily perhaps) as release blocker. -- nosy: +georg.brandl, larry, pitrou priority: normal - release blocker stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
STINNER Victor added the comment: It looks like a regression of introduced by the fix of the issue #1470548, changeset 66f92f76b2ce. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17915] Encoding error with sax and codecs
STINNER Victor added the comment: Extracted test from report.txt. Test with Python 3.4: $ ./python test_codecs.py Traceback (most recent call last): File test_codecs.py, line 7, in module xml.startDocument() File /home/haypo/prog/python/default/Lib/xml/sax/saxutils.py, line 148, in startDocument self._encoding) File /home/haypo/prog/python/default/Lib/codecs.py, line 699, in write return self.writer.write(data) File /home/haypo/prog/python/default/Lib/codecs.py, line 355, in write data, consumed = self.encode(object, self.errors) TypeError: Can't convert 'bytes' object to str implicitly _gettextwriter() of xml.sax.saxutils does not recognize codecs classes. (See also the PEP 400 :-)). -- Added file: http://bugs.python.org/file30158/test_codecs.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17915 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com