New submission from Moriyoshi Koizumi <mozo+pyt...@mozo.jp>:

ElementTree doesn't correctly serialize end-of-line characters (#xa, 
#xd) in attribute values.  Since bare end-of-line characters are 
converted to #x20 by the parser according to the specification [1], such 
characters that are represented as character references in the original 
document must be serialized in the same form.

[1] http://www.w3.org/TR/xml11/#AVNormalize   

### sample code

from xml.etree.ElementTree import ElementTree
from cStringIO import StringIO

# builder = ElementTree(file=StringIO("<foo>\x0d</foo>"))
# out = StringIO()
# builder.write(out)
# print out.getvalue()

out = StringIO()
ElementTree(file=StringIO(
'''<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ATTLIST foo attr CDATA "">
]>
<foo attr="   test
&#13;test&#32; test&#10;a  ">&#10;</foo>
''')).write(out)
# should be "<foo attr="   test &#13;test  test&#10;a  ">\x0a</foo>
print out.getvalue()

out = StringIO()
ElementTree(file=StringIO(
'''<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ATTLIST foo attr NMTOKENS "">
]>
<foo attr="   test
&#13;test&#32; test&#10;a  ">&#10;</foo>
''')).write(out)
# should be "<foo attr="test &#13;test test&#10;a">\x0a</foo>
print out.getvalue()

----------
components: XML
messages: 94074
nosy: moriyoshi
severity: normal
status: open
title: Incorrect serialization of end-of-line characters in attribute values
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7139>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to