En Wed, 06 Feb 2008 00:54:49 -0200, Tess <[EMAIL PROTECTED]> escribió:
> I have a text file with marked up data that I need to convert into a > text tab separated file. > > The structure of the input file is listed below (see file 1) and the > desired output file is below as well (see file 2). > > I am a complete novice with python and would appreciate any tips you > may be able to provide. > > > file 1: > <item>TABLE</table> > <color>black</color> > <color>blue</color> > <color>red</color> > <item>CHAIR</table> > <color>yellow</color> > <color>black</color> > <color>red</color> > <item>TABLE</table> > <color>white</color> > <color>gray</color> > <color>pink</color> Are you sure it says <item>...</table>? Are ALWAYS three colors per item, as in your example? If this is the case, just read groups of 4 lines and ignore the tags. > file 2 (tab separated): > TABLE black blue red > CHAIR yellow black red > TABLE white gray pink The best way to produce this output is using the csv module: http://docs.python.org/lib/module-csv.html So we need a list of rows, being each row a list of column data. A simple way of building such structure from the input file would be: rows = [] row = None for line in open('file1.txt'): line = line.strip() # remove leading and trailing whitespace if line.startswith('<item>'): if row: rows.append(row) j = row.index("</") item = row[6:j] row = [item] elif line.startswith('<color>'): j = row.index("</") color = row[7:j] row.append(color) else: raise ValueError, "can't understand line: %r" % line if row: rows.append(row) This allows for a variable number of "color" lines per item. Once the `rows` list is built, we only have to create a csv writer for the right dialect ('excel_tab' looks promising) and feed the rows to it: import csv fout = open('file2.txt', 'wb') writer = csv.writer(fout, dialect='excel_tab') writer.writerows(rows) That's all folks! -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list