* updated the answer so that we don't think perl5.005 is the newest version of perl
* cleaned up the examples to give them parallel structure and to generalize the method. (What was popstates anyway?) * added another example * mention Mastering Regular Expressions to motivate people to find out how regexes work and that they can be tuned. diff -u -d -r1.28 perlfaq6.pod --- perlfaq6.pod 3 Jan 2005 18:43:37 -0000 1.28 +++ perlfaq6.pod 26 Jan 2005 18:48:13 -0000 @@ -518,32 +518,58 @@ =head2 How do I efficiently match many regular expressions at once? -The following is extremely inefficient: +( contributed by brian d foy ) - # slow but obvious way - @popstates = qw(CO ON MI WI MN); - while (defined($line = <>)) { - for $state (@popstates) { - if ($line =~ /\b$state\b/i) { - print $line; - last; - } - } - } +Avoid asking Perl to compile a regular expression every time +you want to match it. In this example, perl must recompile +the regular expression for every iteration of the foreach() +loop since it has no way to know what $state will be. -That's because Perl has to recompile all those patterns for each of -the lines of the file. As of the 5.005 release, there's a much better -approach, one which makes use of the new C<qr//> operator: + @patterns = qw( foo bar baz ); + + LINE: while( <> ) + { + foreach $pattern ( @patterns ) + { + print if /\b$pattern\b/i; + next LINE; + } + } - # use spiffy new qr// operator, with /i flag even - use 5.005; - @popstates = qw(CO ON MI WI MN); - @poppats = map { qr/\b$_\b/i } @popstates; - while (defined($line = <>)) { - for $patobj (@poppats) { - print $line if $line =~ /$patobj/; - } - } +The qr// operator showed up in perl 5.005. It compiles a +regular expression, but doesn't apply it. When you use the +pre-compiled version of the regex, perl does less work. In +this example, I inserted a map() to turn each pattern into +its pre-compiled form. The rest of the script is the same, +but faster. + + @patterns = map { qr/\b$_\b/i } qw( foo bar baz ); + + LINE: while( <> ) + { + foreach $pattern ( @patterns ) + { + print if /\b$pattern\b/i; + next LINE; + } + } + +In some cases, you may be able to make several patterns into +a single regular expression. Beware of situations that require +backtracking though. + + $regex = join '|', qw( foo bar baz ); + + LINE: while( <> ) + { + print if /\b(?:$regex)\b/i; + } + +For more details on regular expression efficiency, see Mastering +Regular Expressions by Jeffrey Freidl. He explains how regular +expressions engine work and why some patterns are surprisingly +inefficient. Once you understand how perl applies regular +expressions, you can tune them for individual situations. =head2 Why don't word-boundary searches with C<\b> work for me? -- brian d foy, [EMAIL PROTECTED]