Hi, I'm trying to create a script that will search an SGML file for the numbers and titles of the hierarchical elements (section level headings) and create a dictionary with the section number as the key and the title as the value.
I've managed to make some progress but I'd like to get some general feedback on my progress so far plus ask a question. When I run this script on a directory that contains multiple files even the files that don't contain any matches generate log files and usually with the contents of the last file that contained matches. I'm not sure what I'm missing so I'd appreciate some advice. Thanks, Greg Here's a very simplified version of my SGML: <sec-main no="1.01"><title>section title 1.01 <sec-sub1 no="1"><title>title 1 <sec-sub1 no="2"><title>title 2 <sec-sub2 no="a"><title>title a <sec-sub2 no="b"><title>title b <sec-sub3 no="i"><title>title i <sec-main no="2.02"><title>section title 2.02 <sec-main no="3.03"><title>section title 3.03 <sec-sub1 no="1"><title>title 1 <sec-sub1 no="2"><title>title 2 <sec-main no="4.04"><title>section title 4.04 <sec-main no="5.05"><title>section title 5.05 And here's what I written so far: import os import re setpath = raw_input("Enter the path where the program should run: ") print table ={} for root, dirs, files in os.walk(setpath): fname = files for fname in files: inputFile = file(os.path.join(root,fname), 'r') while 1: lines = inputFile.readlines(10000) if not lines: break for line in lines: main = re.search(r'(?i)<sec-main no=\"(\d+\.\d\d)\">\n?<title>(.*?)\n' , line) sub_one = re.search(r'(?i)<sec-sub1 no=\"(\w*)\">\n?<title>(.*?)\n' , line) sub_two = re.search(r'(?i)<sec-sub2 no=\"(\w*)\">\n?<title>(.*?)\n' , line) sub_three = re.search(r'(?i)<sec-sub3 no=\"(\w*)\">\n?<title>(.*?)\n' , line) if main is not None: table[main.group(1)] = main.group(2) m = main.group(1) if main is None: pass if sub_one is not None: one = m + '[' + sub_one.group(1) + ']' table[one] = sub_one.group(2) if sub_one is None: pass if sub_two is not None: two = one + '[' + sub_two.group(1) + ']' table[two] = sub_two.group(2) if sub_two is None: pass if sub_three is not None: three = two + '[' + sub_three.group(1) + ']' table[three] = sub_three.group(2) if sub_three is None: pass str_table = str(table) (name,ext) = os.path.splitext(fname) output_name = name + '.log' outputFile = file(os.path.join(root,output_name), 'w') outputFile.write(str_table) outputFile.close() -- http://mail.python.org/mailman/listinfo/python-list