Hal Styli, 27.02.2010 21:50: > I have a sed solution to the problems below but would like to rewrite > in python...
Note that sed (or any other line based or text based tool) is not a sensible way to handle XML. If you want to read XML, use an XML parser. They are designed to do exactly what you want in a standard compliant way, and they can deal with all sorts of XML formatting and encoding, for example. > I need to strip out some data from a quirky xml file into a csv: > > from something like this > > < ..... cust="dick" .... product="eggs" ... quantity="12" .... > > < .... cust="tom" .... product="milk" ... quantity="2" ...> > < .... cust="harry" .... product="bread" ... quantity="1" ...> > < .... cust="tom" .... product="eggs" ... quantity="6" ...> > < ..... cust="dick" .... product="eggs" ... quantity="6" .... > As others have noted, this doesn't tell much about your XML. A more complete example would be helpful. > to this > > dick,eggs,12 > tom,milk,2 > harry,bread,1 > tom,eggs,6 > dick,eggs,6 > > I am new to python and xml and it would be great to see some slick > ways of achieving the above by using python's XML capabilities to > parse the original file or python's regex to achive what I did using > sed. It's funny how often people still think that SAX is a good way to solve XML problems. Here's an untested solution that uses xml.etree.ElementTree: from xml.etree import ElementTree as ET csv_field_order = ['cust', 'product', 'quantity'] clean_up_used_elements = None for event, element in ET.iterparse("thefile.xml", events=['start']): # you may want to select a specific element.tag here # format and print the CSV line to the standard output print(','.join(element.attrib.get(title, '') for title in csv_field_order)) # safe some memory (in case the XML file is very large) if clean_up_used_elements is None: # this assigns the clear() method of the root (first) element clean_up_used_elements = element.clear clean_up_used_elements() You can strip everything dealing with 'clean_up_used_elements' (basically the last section) if your XML file is small enough to fit into memory (a couple of MB is usually fine). Stefan -- http://mail.python.org/mailman/listinfo/python-list