And now for something completely different... I've been reading up a bit about Python and Excel and I quickly told the program to output to Excel quite easily. However, what if the input file were a Word document? I can't seem to find much information about parsing Word files. What could I add to make the same program work for a Word file?
Again thanks a lot. And the Excel Add on... import codecs import re from win32com.client import Dispatch path = "c:\\text_samples\\chem_1_utf8.txt" path2 = "c:\\text_samples\\chem_2.txt" input = codecs.open(path, 'r','utf8') output = codecs.open(path2, 'w', 'utf8') NR_RE = re.compile(r'^\d+-\d+-\d+$') #pattern for EINECS number tokens = input.read().split() def iter_elements(tokens): product = [] for tok in tokens: if NR_RE.match(tok) and len(product) >= 4: product[2:-1] = [' '.join(product[2:-1])] yield product product = [] product.append(tok) yield product xlApp = Dispatch("Excel.Application") xlApp.Visible = 1 xlApp.Workbooks.Add() c = 1 for element in iter_elements(tokens): xlApp.ActiveSheet.Cells(c,1).Value = element[0] xlApp.ActiveSheet.Cells(c,2).Value = element[1] xlApp.ActiveSheet.Cells(c,3).Value = element[2] xlApp.ActiveSheet.Cells(c,4).Value = element[3] c = c + 1 xlApp.ActiveWorkbook.Close(SaveChanges=1) xlApp.Quit() xlApp.Visible = 0 del xlApp input.close() output.close() -- http://mail.python.org/mailman/listinfo/python-list