Re: re.search much slower then grep on some regular expressions

Kris Kennaway Wed, 09 Jul 2008 12:48:34 -0700

samwyse wrote:

On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote:

samwyse wrote:

You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/
"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "

Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute).  If the matching was fast then I could possibly pickle the
lexer though (but it's not).


That's funny, the compilation is almost instantaneous for me.


My lexicon was quite a bit bigger, containing about 150 strings and regexps.

However, I just tested it to several files, the first containing
4875*'a', the rest each twice the size of the previous.  And you're
right, for each doubling of the file size, the match take four times
as long, meaning O(n^2).  156000*'a' would probably take 8 hours.
Here are my results:


The docs say it is supposed to be linear in the file size ;-) ;-(

Kris

--
http://mail.python.org/mailman/listinfo/python-list

Re: re.search much slower then grep on some regular expressions

Reply via email to