Re: re.search much slower then grep on some regular expressions

Kris Kennaway Mon, 07 Jul 2008 12:29:20 -0700

Paddy wrote:

On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote:

Henning_Thornblad wrote:

What can be the cause of the large difference between re.search and
grep?

grep uses a smarter algorithm ;)

This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re
row=""
for a in range(156000):
    row+="a"
print re.search('[^ "=]*/',row)
While doing a simple grep:
grep '[^ "=]*/' input                  (input contains 156.000 a in
one row)
doesn't even take a second.
Is this a bug in python?

You could call this a performance bug, but it's not common enough in real
code to get the necessary brain cycles from the core developers.
So you can either write a patch yourself or use a workaround.

re.search('[^ "=]*/', row) if "/" in row else None

might be good enough.

Peter


It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
algorithm.
You could argue that if the costly RE features are not used then maybe
simpler, faster algorithms should be automatically swapped in but ....


I can and do :-)

It's a major problem that regular expression parsing in python hasexponential complexity when polynomial algorithms (for a subset ofregexp expressions, e.g. excluding back-references) are well-known.

It rules out using python for entire classes of applications whereregexp parsing is on the critical path.


Kris
--
http://mail.python.org/mailman/listinfo/python-list

Re: re.search much slower then grep on some regular expressions

Reply via email to