On Fri, Jan 7, 2011 at 8:43 PM, Keith Anthony <kanth...@woh.rr.com> wrote: > My previous question asked how to read a file into a strcuture > a line at a time. Figured it out. Now I'm trying to use .find > to separate out the PDF objects. (See code) PROBLEM/QUESTION: > My call to lines[i].find does NOT find all instances of endobj. > Any help available? Any insights? > > #!/usr/bin/python > > inputfile = file('sample.pdf','rb') # This is PDF with which we > will work > lines = inputfile.readlines() # read file one line at a time > > linestart = [] # Starting address for each > line > lineend = [] # Ending address for each line > linetype = [] > > print len(lines) # print number of lines > > i = 0 # define an iterator, i > addr = 0 # and address pointer > > while i < len(lines): # Go through each line > linestart = linestart + [addr] > length = len(lines[i]) > lineend = lineend + [addr + (length-1)] > addr = addr + length > i = i + 1 > > i = 0 > while i < len(lines): # Initialize line types as > normal > linetype = linetype + ['normal'] > i = i + 1 > > i = 0 > while i < len(lines): # > if lines[i].find(' obj') > 0: > linetype[i] = 'object' > print "At address ",linestart[i],"object found at line ",i,": ", > lines[i] > if lines[i].find('endobj') > 0: > linetype[i] = 'endobj' > print "At address ",linestart[i],"endobj found at line ",i,": ", > lines[i] > i = i + 1
Your code can be simplified significantly. In particular: - Don't add single-element lists. Use the list.append() method instead. - One seldom manually tracks counters like `i` in Python; use range() or enumerate() instead. - Lists have a multiply method which gives the concatenation of n copies of the list. Revised version (untested obviously): inputfile = file('sample.pdf','rb') # This is PDF with which we will work lines = inputfile.readlines() # read file one line at a time linestart = [] # Starting address for each line lineend = [] # Ending address for each line linetype = ['normal']*len(lines) print len(lines) # print number of lines addr = 0 # and address pointer for line in lines: # Go through each line linestart.append(addr) length = len(line) lineend.append(addr + (length-1)) addr += length for i, line in enumerate(lines): if line.find(' obj') > 0: linetype[i] = 'object' print "At address ",linestart[i],"object found at line ",i,": ", line if line.find('endobj') > 0: linetype[i] = 'endobj' print "At address ",linestart[i],"endobj found at line ",i,": ", line As to the bug: I think you want "!= -1" rather than "> 0" for your conditionals; remember that Python list/string indices are 0-based. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list