[EMAIL PROTECTED] wrote: > While creating a log parser for fairly large logs, we have run into an > issue where the time to process was relatively unacceptable (upwards > of 5 minutes for 1-2 million lines of logs). In contrast, using the > Linux tool grep would complete the same search in a matter of seconds. > > The search we used was a regex of 6 elements "or"ed together, with an > exclusionary set of ~3 elements. Due to the size of the files, we > decided to run these line by line, and due to the need of regex > expressions, we could not use more traditional string find methods.
Just guessing (since I haven't tested this), switching from doing it line by line to big chunks (whatever will fit in memory) at a time would help, but I don't think you can get close to the speed of grep (eg while True: chunk = thefile.read(100000000)) if not len(chunk): break for x in theRE.findall(chunk): ..... ) Function calls in python are expensive. -- http://mail.python.org/mailman/listinfo/python-list