Optimizations in the regexp engine have been a long-standing TODO for perl5. 5.10 introduced some improvements natively (but I don't think those affect your case). There's also Regexp::Optimizer on CPAN, which unfortunately doesn't address your case either.
See also this writeup: http://swtch.com/~rsc/regexp/regexp1.html which claims a fundamental algorithmic overhaul of the engine is due. The good news is that at least in principle, newer perls allow you to plug in an alternate engine, now it's just a simple matter of writing it... On Thu, Jan 1, 2009 at 1:57 PM, Eli Billauer <[email protected]> wrote: > Gaal Yahas wrote: > >> I couldn't find other mention of this, so I'd say this behavior is a >> bit underspecced, but unlikely to change in Perl 5 -- too many things >> would break otherwise. >> >> > Thanks. I suppose that's the best answer one can get... > > In the meanwhile, I found out that it may not always be such a good idea > to be a mathematician about regular expressions. Consider, for example, > this: > > $chars = qr/[\-_+%a-z0-9]/; # Some chars we allow > $charsdot = qr/\.|$chars/; # Dot allowed as well > > Cute, isn't it? $charsdot is everything $chars is, only with the dot > allowed as well. Now we can use it in regular expressions, such as > > print "Matched\n" if ($x =~ /$charsdot{20000}/); > > Well, not such a good idea. Trying this on Perl 5.8.8 makes the matching > above run 10 times slower (5 whole seconds for a 10MBytes random string) > compared with simply adding the dot to the square brackets. > > Which shouldn't come as a surprise, if we run "print $charsdot;" just to > find out that it gives: > (?-xism:\.|(?-xism:[\-_+%a-z0-9])) > > Lesson learned: This regular expression is not optimized. Not the > slightest bit. This isn't a qr// issue, since the same thing happens > when $chars' content is written in explicitly. > > ----------------------------- > > As for inline comments with /x, I don't think that makes the code more > readable, but that's a matter of taste. I kind-of lose the continuity, > and it's pretty difficult to get really useful comments in there. On the > other hand, if a parentheses get wrongly placed, then the comments > convince the readers what he or she should read, which makes the code > even more difficult to maintain. > > What I liked about the qr// is that the regular expression can be broken > down to its pieces with meaningful names. But as the example above > shows, that could have a cost. > > Eli > > -- > Web: http://www.billauer.co.il > > _______________________________________________ > Perl mailing list > [email protected] > http://perl.org.il/mailman/listinfo/perl > -- Gaal Yahas <[email protected]> http://gaal.livejournal.com/ _______________________________________________ Perl mailing list [email protected] http://perl.org.il/mailman/listinfo/perl
