[EMAIL PROTECTED] wrote:

> While creating a log parser for fairly large logs, we have run into an
> issue where the time to process was relatively unacceptable (upwards
> of 5 minutes for 1-2 million lines of logs). In contrast, using the
> Linux tool grep would complete the same search in a matter of seconds.
> 
> The search we used was a regex of 6 elements "or"ed together, with an
> exclusionary set of ~3 elements. Due to the size of the files, we
> decided to run these line by line, and due to the need of regex
> expressions, we could not use more traditional string find methods.

Just guessing (since I haven't tested this), switching from doing it line by
line to big chunks (whatever will fit in memory) at a time would help, but
I don't think you can get close to the speed of grep  (eg 
while True:
        chunk = thefile.read(100000000))
        if not len(chunk): break
        for x in theRE.findall(chunk):
                .....  
)
Function calls in python are expensive.



-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to