My apologies to Fredrik Lundh of Pythonware for the omission of ElementType+sgmlop in my recent listing of Python-XML packages that handle XML 1.1. The list (that I'm aware of) currently includes: 1. pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html, http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml, http://www.ltg.ed.ac.uk/software/gpl_xml.html, http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree library from Pythonware (http://effbot.org/zone/element.htm, http://effbot.org/zone/element-index.htm) If I've forgotten anyone, please help me complete the list. I'm still a Python-XML beginner, and any omissions are unintentional. Thanks again to all those who have provided such tools. Ken Fredrik Lundh <[EMAIL PROTECTED]> wrote
fwiw, as the following snippet illustrates, ET+sgmlop can read files with 1.1-style character references, but the ET serializer doesn't encode such characters on the way out. this script from elementtree import ElementTree, SgmlopXMLTreeBuilder from StringIO import StringIO file = StringIO("<test>this is a backspace: </test>") doc = ElementTree.parse(file, SgmlopXMLTreeBuilder.TreeBuilder()) root = doc.getroot() print repr(root.text) print repr(ElementTree.tostring(root)) prints 'this is a backspace: \x08' '<test>this is a backspace: \x08</test>' which isn't entirely correct. fixing this in ElementTree is pretty straightforward; just tweak the RE, and make sure _encode_entity is called for all cdata sections. you can also use the following brute-force runtime patch: # patch the ET serializer (works with 1.2.X, may break beyond that) import re from elementtree import ElementTree escape = re.compile(u'[&<>\"\x01-\x09\x0b\x0c\x0e-\x1f\u0080-\uffff]+') ElementTree._encode_entity.func_defaults = (escape,) ElementTree._escape_cdata = lambda a, b: ElementTree._encode_entity(a) # end </FredrikLundh> _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig