street.swee...@mailworks.org wrote: > Hello all, > > I'm trying to merge and filter some xml. This is working well, but I'm > getting one node that's not in my list to include. Python version is > 3.4.0. > > The goal is to merge multiple xml files and then write a new one based > on whether or not <pid> is in an include list. In the mock data below, > the 3 xml files have a total of 8 <rec> nodes, and I have 4 <pid> values > in my list. The output is correctly formed xml, but it includes 5 <rec> > nodes; the 4 in the list, plus 89012 from input1.xml. It runs without > error. I've used used type() to compare > rec.find('part').find('pid').text and the items in the list, they're > strings. When the first for loop is done, xmlet has 8 rec nodes. Is > there a problem in the iteration in the second for? Any other > recommendations also welcome. Thanks! > > > The code itself was cobbled together from two sources, > http://stackoverflow.com/questions/9004135/merge-multiple-xml-files-from-> > command-line/11315257#11315257 > and http://bryson3gps.wordpress.com/tag/elementtree/ > > Here's the code and data: > > #!/usr/bin/env python3 > > import os, glob > from xml.etree import ElementTree as ET > > xmls = glob.glob('input*.xml') > ilf = os.path.join(os.path.expanduser('~'),'include_list.txt') > xo = os.path.join(os.path.expanduser('~'),'mergedSortedOutput.xml') > > il = [x.strip() for x in open(ilf)] > > xmlet = None > > for xml in xmls: > d = ET.parse(xml).getroot() > for rec in d.iter('inv'): > if xmlet is None: > xmlet = d > else: > xmlet.extend(rec) > > for rec in xmlet: > if rec.find('part').find('pid').text not in il: > xmlet.remove(rec) > > ET.ElementTree(xmlet).write(xo) > > quit()
I believe Alan answered your question; I just want to thank you for taking the time to describe your problem clearly and for providing all the necessary parts to reproduce it. Bonus part: Other options to filter a mutable sequence: (1) assign to the slice: items[:] = [item for item in items if is_match(item)] (2) iterate over it in reverse order: for item in reversed(items): if not ismatch(item): items.remove(item) Below is a way to integrate method 1 in your code: [...] # set lookup is more efficient than lookup in a list il = set(x.strip() for x in open(ilf)) def matching_recs(recs): return (rec for rec in recs if rec.find("part/pid").text in il) xmlet = None for xml in xmls: inv = ET.parse(xml).getroot() if xmlet is None: xmlet = inv # replace all recs with matching recs xmlet[:] = matching_recs(inv) else: # append only matching recs xmlet.extend(matching_recs(inv)) ET.ElementTree(xmlet).write(xo) # the script will end happily without a quit() or exit() call # quit() At least with your sample data > for rec in d.iter('inv'): iterates over a single node (the root) so I omitted that loop. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor