[issue1767933] Badly formed XML using etree and utf-16

2012-07-18 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Python 3.2 currently shipped in last Ubuntu LTS and will be in production at 
least next 5 years. I think it will be main Python version for many years.

Here is a compound patch (changesets 6120cf695574, 64ff90e07d71, 51978f89e5ed 
and 63ba0c32b81a) for Python 3.2 without tests. It is almost same as for 3.3, 
except using manual finalizing instead ExitStack.

--
Added file: 
http://bugs.python.org/file26425/etree_write_utf16_without_tests-3.2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-17 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 51978f89e5ed by Eli Bendersky in branch 'default':
Optimize tostringlist by taking the stream class outside the function. It's now 
2x faster on short calls. Related to #1767933
http://hg.python.org/cpython/rev/51978f89e5ed

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

How about porting this to 3.2? The main difficulty I see with the tests, which 
significantly differ in 3.2 and 3.3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-17 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Frankly, I don't think the problem is serious enough to warrant a backport to 
3.2, given that 3.3 gonna be out in a few weeks. The issue was open for 5 years 
without anyone seriously complaining :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-16 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Fixed the invariant violation in changeset 64ff90e07d71

I'll review the performance difference separately

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-16 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

I posted a message to python-dev about the performance issue

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Thank you, Eli.

However changes to tostring() and tostringlist() break the invariant 
b.join(tostringlist(element, 'utf-16')) == tostring(element, 'utf-16'). You 
should add followed methods to DataStream:

def seekable(self):
return True

def tell(self):
return len(data)

Note, that monkey-patched version is faster.

stream = io.BufferedIOBase()
stream.writable = lambda: True
stream.write = data.append
stream.seekable = lambda: True
stream.tell = data.__len__

Benchmark results:

tostring() with BytesIO:
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root/')  ET.tostring(e, 'utf-16')
1000 loops, best of 3: 268 usec per loop
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root'+'child/'*100+'/root' )  ET.tostring(e, 'utf-16')
100 loops, best of 3: 4.63 msec per loop

tostring() with monkey-patching:
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root/')  ET.tostring(e, 'utf-16')
1000 loops, best of 3: 263 usec per loop
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root'+'child/'*100+'/root' )  ET.tostring(e, 'utf-16')
100 loops, best of 3: 3.84 msec per loop

tostringlist() with DataStream class:
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root/')  ET.tostringlist(e, 'utf-16')
1000 loops, best of 3: 624 usec per loop
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root'+'child/'*100+'/root' )  ET.tostringlist(e, 'utf-16')
100 loops, best of 3: 4.09 msec per loop

tostringlist() with monkey-patching:
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root/')  ET.tostringlist(e, 'utf-16')1000 loops, best of 3: 259 
usec per loop
$ ./python -m timeit -s import xml.etree.ElementTree as ET; 
e=ET.XML('root'+'child/'*100+'/root' )  ET.tostringlist(e, 'utf-16')
100 loops, best of 3: 3.81 msec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-14 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 6120cf695574 by Eli Bendersky in branch 'default':
Close #1767933: Badly formed XML using etree and utf-16. Patch by Serhiy 
Storchaka, with some minor fixes by me
http://hg.python.org/cpython/rev/6120cf695574

--
nosy: +python-dev
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-13 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Serhiy, can you also take a look at #9458 - it may be related?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-13 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Patch updated with some comments.

--
Added file: http://bugs.python.org/file26377/etree_write_utf16_5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___diff -r 677a9326b4d4 Lib/test/test_xml_etree.py
--- a/Lib/test/test_xml_etree.pyMon Jul 09 18:16:11 2012 -0700
+++ b/Lib/test/test_xml_etree.pyFri Jul 13 23:23:04 2012 +0300
@@ -21,7 +21,7 @@
 import weakref
 
 from test import support
-from test.support import findfile, import_fresh_module, gc_collect
+from test.support import TESTFN, findfile, unlink, import_fresh_module, 
gc_collect
 
 pyET = None
 ET = None
@@ -888,65 +888,6 @@
 
 ET.XML(?xml version='1.0' encoding='%s'?xml / % encoding)
 
-def encoding():
-r
-Test encoding issues.
-
- elem = ET.Element(tag)
- elem.text = abc
- serialize(elem)
-'tagabc/tag'
- serialize(elem, encoding=utf-8)
-b'tagabc/tag'
- serialize(elem, encoding=us-ascii)
-b'tagabc/tag'
- serialize(elem, encoding=iso-8859-1)
-b?xml version='1.0' encoding='iso-8859-1'?\ntagabc/tag
-
- elem.text = \\'
- serialize(elem)
-'taglt;amp;\'gt;/tag'
- serialize(elem, encoding=utf-8)
-b'taglt;amp;\'gt;/tag'
- serialize(elem, encoding=us-ascii) # cdata characters
-b'taglt;amp;\'gt;/tag'
- serialize(elem, encoding=iso-8859-1)
-b'?xml version=\'1.0\' 
encoding=\'iso-8859-1\'?\ntaglt;amp;\'gt;/tag'
-
- elem.attrib[key] = \\'
- elem.text = None
- serialize(elem)
-'tag key=lt;amp;quot;\'gt; /'
- serialize(elem, encoding=utf-8)
-b'tag key=lt;amp;quot;\'gt; /'
- serialize(elem, encoding=us-ascii)
-b'tag key=lt;amp;quot;\'gt; /'
- serialize(elem, encoding=iso-8859-1)
-b'?xml version=\'1.0\' encoding=\'iso-8859-1\'?\ntag 
key=lt;amp;quot;\'gt; /'
-
- elem.text = '\xe5\xf6\xf6'
- elem.attrib.clear()
- serialize(elem)
-'tag\xe5\xf6\xf6lt;gt;/tag'
- serialize(elem, encoding=utf-8)
-b'tag\xc3\xa5\xc3\xb6\xc3\xb6lt;gt;/tag'
- serialize(elem, encoding=us-ascii)
-b'tag#229;#246;#246;lt;gt;/tag'
- serialize(elem, encoding=iso-8859-1)
-b?xml version='1.0' 
encoding='iso-8859-1'?\ntag\xe5\xf6\xf6lt;gt;/tag
-
- elem.attrib[key] = '\xe5\xf6\xf6'
- elem.text = None
- serialize(elem)
-'tag key=\xe5\xf6\xf6lt;gt; /'
- serialize(elem, encoding=utf-8)
-b'tag key=\xc3\xa5\xc3\xb6\xc3\xb6lt;gt; /'
- serialize(elem, encoding=us-ascii)
-b'tag key=#229;#246;#246;lt;gt; /'
- serialize(elem, encoding=iso-8859-1)
-b'?xml version=\'1.0\' encoding=\'iso-8859-1\'?\ntag 
key=\xe5\xf6\xf6lt;gt; /'
-
-
 def methods():
 r
 Test serialization methods.
@@ -2166,16 +2107,185 @@
 self.assertEqual(self._subelem_tags(e), ['a1'])
 
 
-class StringIOTest(unittest.TestCase):
+class IOTest(unittest.TestCase):
+def tearDown(self):
+unlink(TESTFN)
+
+def test_encoding(self):
+# Test encoding issues.
+elem = ET.Element(tag)
+elem.text = abc
+self.assertEqual(serialize(elem), 'tagabc/tag')
+self.assertEqual(serialize(elem, encoding=utf-8),
+b'tagabc/tag')
+self.assertEqual(serialize(elem, encoding=us-ascii),
+b'tagabc/tag')
+for enc in (iso-8859-1, utf-16, utf-32):
+self.assertEqual(serialize(elem, encoding=enc),
+(?xml version='1.0' encoding='%s'?\n
+ tagabc/tag % enc).encode(enc))
+
+elem = ET.Element(tag)
+elem.text = \\'
+self.assertEqual(serialize(elem), 'taglt;amp;\'gt;/tag')
+self.assertEqual(serialize(elem, encoding=utf-8),
+b'taglt;amp;\'gt;/tag')
+self.assertEqual(serialize(elem, encoding=us-ascii),
+b'taglt;amp;\'gt;/tag')
+for enc in (iso-8859-1, utf-16, utf-32):
+self.assertEqual(serialize(elem, encoding=enc),
+(?xml version='1.0' encoding='%s'?\n
+ taglt;amp;\'gt;/tag % enc).encode(enc))
+
+elem = ET.Element(tag)
+elem.attrib[key] = \\'
+self.assertEqual(serialize(elem), 'tag key=lt;amp;quot;\'gt; 
/')
+self.assertEqual(serialize(elem, encoding=utf-8),
+b'tag key=lt;amp;quot;\'gt; /')
+self.assertEqual(serialize(elem, encoding=us-ascii),
+b'tag key=lt;amp;quot;\'gt; /')
+for enc in (iso-8859-1, utf-16, utf-32):
+self.assertEqual(serialize(elem, encoding=enc),
+(?xml version='1.0' encoding='%s'?\n
+ tag key=\lt;amp;quot;'gt;\ / % enc).encode(enc))
+
+elem = ET.Element(tag)
+elem.text = '\xe5\xf6\xf6'
+self.assertEqual(serialize(elem), 'tag\xe5\xf6\xf6lt;gt;/tag')
+self.assertEqual(serialize(elem, encoding=utf-8),
+

[issue1767933] Badly formed XML using etree and utf-16

2012-07-12 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Thanks, this looks much better. I've reviewed the _4 patch with some minor 
comments.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-08 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here is a patch with using context management (as Eli advised). This
makes error handling much safer and probably makes the code a little
easier. Several new tests are added.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-08 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file26316/etree_write_utf16_4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-07 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 Serhiy, note that _SimpleElementPath is now gone in 3.3, since ElementPath.py 
 is always there in stdlib. Could you update the patch to reflect this?

Don't worry, _SimpleElementPath is not used in changes.

 Another thing. I'm trying really hard to phase out the doctest tests of 
 etree, replacing them with unittest-based tests as much as possible. The 
 doctests are causing all kinds of trouble with parametrized testing for both 
 the Python and the C implementations. Please don't add new doctests. If you 
 add tests, add them to existing TestCase classes, or create new ones.

Done. I replaced the encoding doctest by unittest-based tests and merge
it with StringIOTest and user IO tests in one IOTest class. Added test
for StringIO writing.

Also I've improved support of unbuffered file objects (as for
issue1470548).

--
Added file: http://bugs.python.org/file26300/etree_write_utf16_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___diff -r d03dbc324b60 Lib/test/test_xml_etree.py
--- a/Lib/test/test_xml_etree.pySat Jul 07 22:15:22 2012 +1000
+++ b/Lib/test/test_xml_etree.pySat Jul 07 17:23:00 2012 +0300
@@ -888,65 +888,6 @@
 
 ET.XML(?xml version='1.0' encoding='%s'?xml / % encoding)
 
-def encoding():
-r
-Test encoding issues.
-
- elem = ET.Element(tag)
- elem.text = abc
- serialize(elem)
-'tagabc/tag'
- serialize(elem, encoding=utf-8)
-b'tagabc/tag'
- serialize(elem, encoding=us-ascii)
-b'tagabc/tag'
- serialize(elem, encoding=iso-8859-1)
-b?xml version='1.0' encoding='iso-8859-1'?\ntagabc/tag
-
- elem.text = \\'
- serialize(elem)
-'taglt;amp;\'gt;/tag'
- serialize(elem, encoding=utf-8)
-b'taglt;amp;\'gt;/tag'
- serialize(elem, encoding=us-ascii) # cdata characters
-b'taglt;amp;\'gt;/tag'
- serialize(elem, encoding=iso-8859-1)
-b'?xml version=\'1.0\' 
encoding=\'iso-8859-1\'?\ntaglt;amp;\'gt;/tag'
-
- elem.attrib[key] = \\'
- elem.text = None
- serialize(elem)
-'tag key=lt;amp;quot;\'gt; /'
- serialize(elem, encoding=utf-8)
-b'tag key=lt;amp;quot;\'gt; /'
- serialize(elem, encoding=us-ascii)
-b'tag key=lt;amp;quot;\'gt; /'
- serialize(elem, encoding=iso-8859-1)
-b'?xml version=\'1.0\' encoding=\'iso-8859-1\'?\ntag 
key=lt;amp;quot;\'gt; /'
-
- elem.text = '\xe5\xf6\xf6'
- elem.attrib.clear()
- serialize(elem)
-'tag\xe5\xf6\xf6lt;gt;/tag'
- serialize(elem, encoding=utf-8)
-b'tag\xc3\xa5\xc3\xb6\xc3\xb6lt;gt;/tag'
- serialize(elem, encoding=us-ascii)
-b'tag#229;#246;#246;lt;gt;/tag'
- serialize(elem, encoding=iso-8859-1)
-b?xml version='1.0' 
encoding='iso-8859-1'?\ntag\xe5\xf6\xf6lt;gt;/tag
-
- elem.attrib[key] = '\xe5\xf6\xf6'
- elem.text = None
- serialize(elem)
-'tag key=\xe5\xf6\xf6lt;gt; /'
- serialize(elem, encoding=utf-8)
-b'tag key=\xc3\xa5\xc3\xb6\xc3\xb6lt;gt; /'
- serialize(elem, encoding=us-ascii)
-b'tag key=#229;#246;#246;lt;gt; /'
- serialize(elem, encoding=iso-8859-1)
-b'?xml version=\'1.0\' encoding=\'iso-8859-1\'?\ntag 
key=\xe5\xf6\xf6lt;gt; /'
-
-
 def methods():
 r
 Test serialization methods.
@@ -2166,16 +2107,129 @@
 self.assertEqual(self._subelem_tags(e), ['a1'])
 
 
-class StringIOTest(unittest.TestCase):
+class IOTest(unittest.TestCase):
+def test_encoding(self):
+# Test encoding issues.
+elem = ET.Element(tag)
+elem.text = abc
+self.assertEqual(serialize(elem), 'tagabc/tag')
+self.assertEqual(serialize(elem, encoding=utf-8),
+b'tagabc/tag')
+self.assertEqual(serialize(elem, encoding=us-ascii),
+b'tagabc/tag')
+for enc in (iso-8859-1, utf-16, utf-32):
+self.assertEqual(serialize(elem, encoding=enc),
+(?xml version='1.0' encoding='%s'?\n
+ tagabc/tag % enc).encode(enc))
+
+elem = ET.Element(tag)
+elem.text = \\'
+self.assertEqual(serialize(elem), 'taglt;amp;\'gt;/tag')
+self.assertEqual(serialize(elem, encoding=utf-8),
+b'taglt;amp;\'gt;/tag')
+self.assertEqual(serialize(elem, encoding=us-ascii),
+b'taglt;amp;\'gt;/tag')
+for enc in (iso-8859-1, utf-16, utf-32):
+self.assertEqual(serialize(elem, encoding=enc),
+(?xml version='1.0' encoding='%s'?\n
+ taglt;amp;\'gt;/tag % enc).encode(enc))
+
+elem = ET.Element(tag)
+elem.attrib[key] = \\'
+self.assertEqual(serialize(elem), 'tag key=lt;amp;quot;\'gt; 
/')
+self.assertEqual(serialize(elem, encoding=utf-8),
+b'tag key=lt;amp;quot;\'gt; /')
+self.assertEqual(serialize(elem, 

[issue1767933] Badly formed XML using etree and utf-16

2012-07-07 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Thanks for your work on this, Serhiy. I made some comments in the code-review 
tool, mainly about the complexity of the resulting code. 

Great work on switching the tests to unittest, much appreciated.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-07-05 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Serhiy, note that _SimpleElementPath is now gone in 3.3, since ElementPath.py 
is always there in stdlib. Could you update the patch to reflect this?

Another thing. I'm trying really hard to phase out the doctest tests of etree, 
replacing them with unittest-based tests as much as possible. The doctests are 
causing all kinds of trouble with parametrized testing for both the Python and 
the C implementations. Please don't add new doctests. If you add tests, add 
them to existing TestCase classes, or create new ones.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-06-24 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

It would be nice to fix this bug before forking of the 3.3.0b1 release clone.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-06-24 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

I will try to find time to review it before the fork, but since time is tight I 
don't promise.

That said, this patch falls more into the bugfix category than a new feature, 
so I think it will be OK after beta as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-06-16 Thread Eli Bendersky

Changes by Eli Bendersky eli...@gmail.com:


--
nosy: +eli.bendersky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-05-20 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

The patch needs some tests.
Also, it seems that ElementTree.write() will only accept files inheriting from 
io.IOBase, where a only a .write() method was expected before. Is it the case?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-05-20 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here is updated patch, with tests and support of objects with only 'write' 
method.

--
Added file: http://bugs.python.org/file25652/etree_write_utf16_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-05-18 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Anyone can review the patch?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-04-27 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here is a patch which solves the problem of writing ElementTree with utf-16 or 
utf-32 encoding.

--
keywords: +patch
nosy: +storchaka
versions: +Python 3.3
Added file: http://bugs.python.org/file25386/etree_write_utf16.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2010-10-02 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

Python 3.1 improves the situation, the file looks more like utf-16, except that 
the BOM (\xff\xfe) is repeated all the time, probably on every internal call 
to file.write().

Here is a test script that should work on both 2.7 and 3.1.

from io import BytesIO
from xml.etree.ElementTree import ElementTree
content = ?xml version='1.0' encoding='UTF-16'?html/html
input = BytesIO(content.encode('utf-16'))
tree = ElementTree()
tree.parse(input)
# Write content
output = BytesIO()
tree.write(output, encoding=utf-16)
assert output.getvalue().decode('utf-16') == content

--
stage: unit test needed - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2010-07-26 Thread Richard Urwin

Richard Urwin soron...@googlemail.com added the comment:

I can't produce an automated test, for want of time, but here is a demonstrator.

Grab the example XHTML from 
http://docs.python.org/library/xml.etree.elementtree.html#elementtree-objects 
or use some tiny ASCII-encoded xml file. Save it as file.xml in the same 
folder as bug-test.py attached here.

Execute bug-test.xml

file.xml is read and then written in UTF-16. The output file is then read and 
dumped to stdout as a byte-stream.

1. To be correct UTF-16, the output should start with 255 254, which should 
never occur in the rest of the file.

2. The rest of the output (including the first line) should alternate zeros 
with ASCII character codes.

3. The file output.xml should be loadable in a UTF16-capable text editor (eg 
jEdit), be recognised as UTF-16 and be identical in terms of content to file.xml

--
nosy: +Richard.Urwin
Added file: http://bugs.python.org/file18211/bug-test.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2010-07-26 Thread Richard Urwin

Richard Urwin soron...@googlemail.com added the comment:

 Execute bug-test.xml

I meant bug-test.py, of course

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2010-07-26 Thread Mark Lawrence

Mark Lawrence breamore...@yahoo.co.uk added the comment:

@Florent: is this something you could pick up, I think it's out of my league.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2010-07-26 Thread Richard Urwin

Richard Urwin soron...@googlemail.com added the comment:

As an example, here is the first two lines of output when I use Python 2.6.3:
60 63 120 109 108 32 118 101 114 115 105 111 110 61 39 49 46 48 39 32 101 110 
99 111 100 105 110 103 61 39 85 84 70 45 49 54 39 63 62 10
60 255 254 104 0 116 0 109 0 108 0 62 255 254 10

Note:
No 255 254 at the start of the file, but several within it.
No zeros interspersing the first line and the odd one missing thereafter.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2010-07-25 Thread Mark Lawrence

Mark Lawrence breamore...@yahoo.co.uk added the comment:

@Richard: Could you provide a test case for this, or do you consider it beyond 
your Python capabilities allowing for your comments on msg75875?

--
nosy: +BreamoreBoy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2008-11-14 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment:

Would you provide a patch?

--
nosy: +amaury.forgeotdarc

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2008-11-14 Thread Richard urwin

Richard urwin [EMAIL PROTECTED] added the comment:

Here is a patch of my quick hack, more for interest than any suggestion
it gets used. Although it does produce good output so long as you avoid
the BOM.

The full solution is beyond my (very weak) Python skills. The character
encoding is tied in with XML character substitution (amp; etc. and
hexadecimal representation of multibyte characters). I could disentangle
it, but I probably wouldn't produce optimal Python, or indeed anything
that wouldn't inspire mirth and/or incredulity.

NB. The workaround suggested by Fredrik Lundh doesn't solve our
particular problems, since the downsize to UTF-8 causes the multi-byte
characters to be represented in hex. Our software doesn't read those. (I
know that's our problem.)

Added file: http://bugs.python.org/file12009/patch.txt

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com