Hello Stefan, Thanks for your response.
> Stuart McGraw wrote: > > I am probably mising something elementary (I am new > > to both xml and lxml), but I am having problems figuring > > out how to get comments when using lxml's iterparse(). > > When I parse xml with parse() and iterate though the > > result, I get the comments. But when I try to do the > > same thing (approximately I think) with iterparse, > > I don't see any comments. > > While the comments end up in the tree that iterparse generates, > they do not show up in the events. Now that you mention it, I > actually think that should change. There should be events > "comment" and "pi" that yield them if requested. That would be ideal, from my perspective. It also seems more consistent with the other interfaces (parse, parse target, etc) > > I was using the standard Python ElementTree but my > > understanding is that it doesn't save comments at all. > > ElementTree strips comments in the parser, that's right. > > > The real file is ~50MB and has about 1M nodes under the > > root so I have to use iterparse and I also have to process > > comments, so I would really appreciate a clue about how > > to do it. Thanks. > > Have you tried the parser target interface? It's a SAX-like > interface that uses callbacks. > > http://codespeak.net/lxml/parsing.html#the-target-parser-interface > http://effbot.org/elementtree/elementtree-xmlparser.htm#the-target-interfa ce Thanks for pointing that out. I'd seen it in the docs but hadn't appreciated that it was relevant. However, I am having trouble getting it to work. Specifically, the test code below produces the output I expected when run with cElementTree, but with lxml, it is missing "end" callbacks, the second "start(entry) " callback, and the resolved entity text. Am I doing something wrong? Test code: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #import xml.etree.cElementTree as ET import lxml.etree as ET from cStringIO import StringIO # XML data... #============================================= xmltxt = \ '''<?xml version="1.0" encoding="UTF-8"?> <!-- Rev 1.06 --> <!DOCTYPE Test [ <!ELEMENT Test (entry*)> <!ELEMENT entry (#PCDATA)> <!-- Description of <entry> element. --> <!ENTITY ex "an existential entity"> ]> <!-- File created: 2008-02-27 --> <Test> <!-- Chronosynclastic Infindibulum Listing --> <entry>text 1 is &ex;</entry> <!-- Deleted: A1500477 --> <entry>text 2</entry> </Test>''' #============================================= print '\nTargetParser:\n-------------' try: XMLParser = ET.XMLParser except AttributeError: XMLParser = ET.XMLTreeBuilder class EchoTarget: def comment(self, tag): print "comment", tag def start(self, tag, attrib): print "start", tag, attrib def end(self, tag): print "end", tag def data(self, data): print "data", repr(data) def close(self): print "close" return "closed!" parser = XMLParser( target = EchoTarget()) result = ET.parse( StringIO (xmltxt), parser) _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig