I've been lurking a few days now, and RFC 72 piqued my interest. I see the
motivation for a backwards-moving regexp engine, but am uncomfortable with
the details.
First worry is the syntax proposed. I cringe when I see the regexp being
expressed such that "(?r)EDCB" matches "BCDE". That and the jumping between
the left-end of the match and the right-end of the match make for a
near-unreadable regexp.
> As a frivolous illustration, the string
> ABCDEFGHIJKLM
> would be matched by:
> m/FG(?r)EDCB(?f)HIJK(?r)A^(?f)LM$/
Can this be repackaged in such a way that it is a more natural extension of
the existing regexp language?
The RFC notes that the look-behind construct (?<= pattern) can almost be
used. Two issues: 1. as currently implemented, the pattern must be of
fixed length. 2. this is a zero-width assertion.
Speculation says the fixed length limitation was done because it offered a
relatively quick hack. A fixed length pattern allows you to go back in the
matched-against string that many characters and match the pattern forwards.
If the regexp engine could "go backwards", then the fixed-length restriction
would be lifted.
The zero-width assertion might be an issue. The RFC's example doesn't
really get into this.
> Imagine a very long input string containing data such as this:
> ... GCAAGAATTGAACTGTAG ...
> If you want to match text that includes the string GAAC, but only when it
> follows GAATT or any one of a large number of other different
possibilities,
If it important to be able to do both:
$large = join '|', @possible'
$data =~ / (?<= $large) GAAC /x; # Don't care which @possible?
and
$data =~ m/ ($large) GAAC /x; # Need $1 to say which @possible
Then perhaps a back-reference-setting look-behind could be implemented?
Don't have an obvious syntax to use (back-tick == back-reference?), but
something like:
$data =~ m/ (?`<= $large) GAAC /x; # Need $1 to say which @possible
Does this ehanced look-behind satisfy the RFC's needs?
= mike "looking for a sig" mulligan