Thanks, Stefan, for pointing me to lxml. It looks like a good alternative to SAX in this situation. However, I'm a little confused as to the best way to remove elements from the tree while keeping their tail text. This is what I have so far:
context = etree.iterparse("test.xml") for event, element in context: for title in element.xpath("child::title"): element.remove(title) Do I need to explicitly assign the tail text to either the parent or the preceding sibling? If so, what's the best way to accomplish that? Thanks, -James On Sun, Jul 27, 2008 at 3:38 PM, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Hi, > > James Sulak wrote: >> I'm attempting to use xml.sax.utils.XMLFilterBase and XMLGenerator to >> take an input XML document, filter out certain elements, and output >> the result to a second XML file. I have it mostly working, except >> that I lose the DTD declaration and anything (processing instructions >> or comments) before the root element. I believe I'm supposed to be >> using a LexicalHandler to get the information from the DTD, but I have >> not been able to figure out how to do this, or how to integrate it >> with the rest of the code. >> >> I'm pretty new at using Python (and SAX, for that matter) to work with >> XML > > Try lxml's iterparse() instead of SAX. It will build an in-memory tree > (including the DTD or its reference if you want, see the parser docs), but you > can remove the unwanted elements from the tree while it parses. It's still > pretty memory friendly and definitely a lot easier to work with than SAX. > > http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk > http://codespeak.net/lxml/tutorial.html#parsing-from-strings-and-files > > Stefan > _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig