Walter Underwood wrote: > I'd just use a SAX interface. When you see id=HL as an attribute, > close the old record and start a new one. Do the same thing at > end of file. Done. > > Generally, if the structure is fairly fixed and you are extracting > the data, think about using SAX. If the shape of the structure > carries a lot of the information, you might need a DOM.
SAX is dead. if you're not using higher-level APIs, you doing more work than you have to, and your code is likely to be slower and buggier than it should be. here's the (c)ElementTree iterparse version: try: import cElementTree as ET except ImportError: from elementtree import ElementTree as ET def process(record): # receives a list of elements for this record for elem in record: print elem.tag, elem.clear() # won't need this any more print record = [] for event, elem in ET.iterparse("test.xml"): if elem.tag == "seg" and elem.get("id") == "HL": process(record) record = [] record.append(elem) if record: process(record) (the cElementTree version of iterparse is about 5 times faster than xml.sax on the parsing part, and putting state in local variables and logic in the loop body is a lot more efficient than putting state in instance variables and logic in a bunch of callback methods). here's a "functional" version of the same thing, btw: import cElementTree as ET from itertools import groupby from operator import itemgetter def source(file): # assign a unique serial to each HL group serial = 0 for event, elem in ET.iterparse("test.xml"): if elem.tag == "seg" and elem.get("id") == "HL": serial += 1 yield serial, elem for dummy, record in groupby(source("test.xml"), itemgetter(0)): # process record for dummy, elem in record: print elem, elem.clear() print </F> _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig