RFC: extend study to produce a fast grep through many regexes

2000-08-21 Thread David L. Nicol



title: study a list of regexes
David Nicol.
Aug 21
version 1
[EMAIL PROTECTED]


Sometimes I have a group of regexen, and I would like to know
which ones will match.

Current practice is to "study" $situation and then grep them:


example a:

study $situation;
@matches = @rules{ grep {$situation =~ m/$_/} keys %rules};


What if there were a faster way to do this, a way to Cstudy a
group of regexes and have the computer determine which ones had
parts in common, so that if $situation =~ m/^foo/ is true, the
fifty rules that start ^bar don't waste any time.  At all.

example b:

$matchcode = study @regexen;

will generate an anonymous subroutine (it's called $matchcode in
the example line) with a tree based on required parts of the
regexes, to minimize the number of match attempts to determine
which regexes will match.  The subroutine will take an array
argument and will return the subset of the rules (as stated in
the original array, either string or compiled qr// references)
that match on all the arguments.

The code in example a could then be replaced

$matchcode = study keys %rules;
@matches = @rules{ $matchcode $situation }


This ability could speed "dirty matching" which currently cannot
take advantage of constant-time hash lookups without standardizing
the dirty parts.




-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
  Does despair.com sell a discordian calendar?



Re: RFC: extend study to produce a fast grep through many regexes

2000-08-21 Thread Larry Wall

David L. Nicol writes:
: What if there were a faster way to do this, a way to Cstudy a
: group of regexes and have the computer determine which ones had
: parts in common, so that if $situation =~ m/^foo/ is true, the
: fifty rules that start ^bar don't waste any time.  At all.

Perl 4 did this sort of thing automatically without a study, at least
with respect to the first character that could match.  Of course, it
didn't do it with regular expressions in an array, but rather in
a "switch" structure.  And you had to bunch your tests right.  If
your regular expressions were in an array, you had to use eval.

So certainly there's room for an interface that can take multiple regex
objects and turn them into a single super regex.  I don't think the
code to do it necessarily belongs in the core, but it would certainly
have to be somewhat incestuous with regex innards.

Larry