I managed to solve this using the following method: """Returns a dictionary of indexes of spectra for which there are secondary scans, along with the indexes of those scans """ scans = dict()
# get an iterable context = cElementTree.iterparse(self.info['filename'], events=("end",)) # turn it into an iterator context = iter(context) # get the root element event, root = context.next() for event, elem in context: if event == "end" and elem.tag == self.XML_SPACE + "scan": parentId = int(elem.get('num')) for child in elem.findall(self.XML_SPACE + 'scan'): childId = int(child.get('num')) try: indexes = scans[parentId] except KeyError: indexes = [] scans[parentId] = indexes indexes.append(childId) child.clear() root.clear() return scans I think the trick is using the 'end' event to determine how much data your iterparse is taking in, but I'm still not quite clear on whether this is the best way to do it. -- http://mail.python.org/mailman/listinfo/python-list