On 12/03/12 03:28, Steven D'Aprano wrote:

Another approach may be to read the whole file into memory in one big
chunk. 1.1 million lines, by (say) 50 characters per line comes to about
53 MB per file, which should be small enough to read into memory and
process it in one chunk. Something like this:

# again untested
text = open('filename').read()
results = []
i = 0
while i<  len(text):
     offset = text.find(key, i)
     if i == -1: break
     i += len(key)  # skip the rest of the key
     # read ahead to the next newline, twice
     i = text.find('\n', i)
     i = text.find('\n', i)
     # now find the following newline, and save everything up to that
     p = text.find('\n', i)
     if p == -1:  p = len(text)
     results.append(text[i:p])
     i = p  # skip ahead

Or using readlines:

index = 0
text = open('filename').readlines()
while True:
  try:
    index = text.index(key,index) + 2
    results.append(text[index])
  except ValueError: break

readlines will take slightly more memory.

But I suspect a tool like grep will be faster. grep
can be downloaded for windows.

To use grep explore the -A option.

Even using grep as a pre-filter to pipe into your
program might work.

But you may also have to accept that processing 450
large files will take some time! You can help by
parallel processing up to the number of cores (less 1)
in your PC, But other than that you may just need a
faster computer! Either more RAM or a SSD drive will
help greatly.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to