[issue17915] Encoding error with sax and codecs

2013-05-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1c01571ce0f4 by Georg Brandl in branch '3.2':
Issue #17915: Fix interoperability of xml.sax with file objects returned by
http://hg.python.org/cpython/rev/1c01571ce0f4

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-12 Thread Georg Brandl

Georg Brandl added the comment:

Fixed in 3.2, 3.3 and default.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-12 Thread Simon Conseil

Simon Conseil added the comment:

thanks everybody !

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It is not working fine on Python 3.3.0.

 with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
... xml = XMLGenerator(f, encoding='iso-8859-1')
... xml.startDocument()
... xml.startElement('root', {'attr': u'\u20ac'})
... xml.endElement('root')
... xml.endDocument()
... 
Traceback (most recent call last):
  File stdin, line 4, in module
  File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 141, in 
startElement
self._write(' %s=%s' % (name, quoteattr(value)))
  File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 96, in 
_write
self._out.write(text)
  File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 699, in write
return self.writer.write(data)
  File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 355, in write
data, consumed = self.encode(object, self.errors)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 
7: ordinal not in range(256)

And shouldn't. On Python 2 XMLGenerator works only with binary files and 
works with text files only due implicit str-unicode converting. On Python 3 
working with binary files was broken. Issue1470548 restores working with binary 
file (for which only XMLGenerator can work correctly), but for backward 
compatibility accepting of text files was left. The problem is that there no 
trustworthy method to determine whenever a file-like object is binary or text.

Accepting of text streams in XMLGenerator should be deprecated in future 
versions.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-07 Thread STINNER Victor

STINNER Victor added the comment:

 Accepting of text streams in XMLGenerator should be deprecated in future 
 versions.

I agree that the following pattern is strange:

with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
   xml = XMLGenerator(f, encoding='iso-8859-1')

Why would I specify a codec twice? What happens if I specify two
different codecs?

with codecs.open('/tmp/test.txt', 'w', encoding='utf-8') as f:
   xml = XMLGenerator(f, encoding='iso-8859-1')

It may be simpler (and safer?) to reject text files. If you cannot
detect that f is a text file, just make it explicit in the
documentation that f must be a binary file.

2013/5/7 Serhiy Storchaka rep...@bugs.python.org:

 Serhiy Storchaka added the comment:

 It is not working fine on Python 3.3.0.

 with codecs.open('/tmp/test.txt', 'w', encoding='iso-8859-1') as f:
 ... xml = XMLGenerator(f, encoding='iso-8859-1')
 ... xml.startDocument()
 ... xml.startElement('root', {'attr': u'\u20ac'})
 ... xml.endElement('root')
 ... xml.endDocument()
 ...
 Traceback (most recent call last):
   File stdin, line 4, in module
   File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 141, in 
 startElement
 self._write(' %s=%s' % (name, quoteattr(value)))
   File /home/serhiy/py/cpython-3.3.0/Lib/xml/sax/saxutils.py, line 96, in 
 _write
 self._out.write(text)
   File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 699, in write
 return self.writer.write(data)
   File /home/serhiy/py/cpython-3.3.0/Lib/codecs.py, line 355, in write
 data, consumed = self.encode(object, self.errors)
 UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in 
 position 7: ordinal not in range(256)

 And shouldn't. On Python 2 XMLGenerator works only with binary files and 
 works with text files only due implicit str-unicode converting. On Python 
 3 working with binary files was broken. Issue1470548 restores working with 
 binary file (for which only XMLGenerator can work correctly), but for 
 backward compatibility accepting of text files was left. The problem is that 
 there no trustworthy method to determine whenever a file-like object is 
 binary or text.

 Accepting of text streams in XMLGenerator should be deprecated in future 
 versions.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue17915
 ___

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch which adds explicit checks for codecs stream writers and adds 
tests for these cases. The tests are not entirely honest, they test only that 
XMLGenerator works with some specially prepared streams. XMLGenerator doesn't 
work with a stream with arbitrary encoding and errors handler.

--
keywords: +patch
Added file: http://bugs.python.org/file30164/XMLGenerator_codecs_stream.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Of course, if this patch will be committed, perhaps it will be worth to apply 
it also for 3.2 which has the same regression.

--
components: +XML
stage: needs patch - patch review
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Perhaps we should add a deprecation warning for codecs streams right in this 
patch?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-06 Thread Simon Conseil

New submission from Simon Conseil:

There is an encoding issue between codecs.open and sax (see attached file). The 
issue is reproducible on Python 3.3.1, it is working fine on Python 3.3.0

--
components: Library (Lib)
files: report.txt
messages: 188508
nosy: sconseil
priority: normal
severity: normal
status: open
title: Encoding error with sax and codecs
versions: Python 3.3
Added file: http://bugs.python.org/file30146/report.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-06 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +haypo, serhiy.storchaka
type:  - behavior
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Since this is a regression, setting (temporarily perhaps) as release blocker.

--
nosy: +georg.brandl, larry, pitrou
priority: normal - release blocker
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-06 Thread STINNER Victor

STINNER Victor added the comment:

It looks like a regression of introduced by the fix of the issue #1470548, 
changeset 66f92f76b2ce.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17915] Encoding error with sax and codecs

2013-05-06 Thread STINNER Victor

STINNER Victor added the comment:

Extracted test from report.txt. Test with Python 3.4:

$ ./python test_codecs.py 
Traceback (most recent call last):
  File test_codecs.py, line 7, in module
xml.startDocument()
  File /home/haypo/prog/python/default/Lib/xml/sax/saxutils.py, line 148, in 
startDocument
self._encoding)
  File /home/haypo/prog/python/default/Lib/codecs.py, line 699, in write
return self.writer.write(data)
  File /home/haypo/prog/python/default/Lib/codecs.py, line 355, in write
data, consumed = self.encode(object, self.errors)
TypeError: Can't convert 'bytes' object to str implicitly

_gettextwriter() of xml.sax.saxutils does not recognize codecs classes. (See 
also the PEP 400 :-)).

--
Added file: http://bugs.python.org/file30158/test_codecs.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17915
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com