I wrote: >> In a few sentences, could some kind soul summarize the >> status of XML 1.1 processing using Python XML modules? > > I haven't done any extensive testing, but I'm quite sure that sgmlop > 1.1 supports it.
fwiw, as the following snippet illustrates, ET+sgmlop can read files with 1.1-style character references, but the ET serializer doesn't encode such characters on the way out. this script from elementtree import ElementTree, SgmlopXMLTreeBuilder from StringIO import StringIO file = StringIO("<test>this is a backspace: </test>") doc = ElementTree.parse(file, SgmlopXMLTreeBuilder.TreeBuilder()) root = doc.getroot() print repr(root.text) print repr(ElementTree.tostring(root)) prints 'this is a backspace: \x08' '<test>this is a backspace: \x08</test>' which isn't entirely correct. fixing this in ElementTree is pretty straightforward; just tweak the RE, and make sure _encode_entity is called for all cdata sections. you can also use the following brute-force runtime patch: # patch the ET serializer (works with 1.2.X, may break beyond that) import re from elementtree import ElementTree escape = re.compile(u'[&<>\"\x01-\x09\x0b\x0c\x0e-\x1f\u0080-\uffff]+') ElementTree._encode_entity.func_defaults = (escape,) ElementTree._escape_cdata = lambda a, b: ElementTree._encode_entity(a) # end </F> _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig