Re: body speedups using new features in perl 5.9.x

David Landgren Wed, 12 Jul 2006 10:30:10 -0700

Justin Mason did write:

David Landgren writes:

[...]

That is, the assembled approach runs in under 1/500th of the time oflooping through the list of REs. A_C is even worse, but given that thepattern is over twice as long, and chock full of metacharacters Isuppose this shouldn't come as a surprise but it does seem odd. I'llcheck back with Yves and see if my methodology is sane there.
"perl5.9.4" -- that's using bleadperl, right?

Yes. I wanted to check out the Aho-Corasick matching. I haven't beenfollowing super closely, but I'm pretty sure Yves said that it doesn'tdeal well with meta-characters. These results tend to bear that out.

Can you post the output of "use re qw(debug); /....regexp..../"?  I'd
like to see what the structure looks like from the R::A regexp -- 500
times is a stunning speedup.

The uncompressed output is 1.7Mb, so I've bzipped it. You can pick it uphere:


  http://www.landgren.net/ re.debug.txt.bz2

(remove the space in the middle: I've done it that way to stop robotsfrom stumbling over it via a list archive and wasting bandwidth).


What may help you follow what's going on more easily is looking at

  http://www.landgren.net/ re.indented.txt.bz2

(again, lose the space) which is the result of the assembly, indented.You can put it directly into a m/.../x and it will work; it's legal Perl.

The speedup comes from the fact that once you hit the engine, for allthe legit hostnames like 'mail.example.com', only the first fewcharacters will be examined and then you hit a fail state. There is nobacktracking left on the stack, so the entire match fails at that point,and 99% of the regop tree is never consulted. Whereas theloop-over-a-list approach will try the pattern, and the next, and thenext...

To put it another way, it's not so much that it matches quickly, butrather it fails quickly. Of course, patterns that contain .*? sequenceswill require backtracking, but that's just the nature of the beast.


David
--
"It's overkill of course, but you can never have too much overkill."

Re: body speedups using new features in perl 5.9.x

Reply via email to