Hugo wrote:
> The difficulty with variable-length lookbehind (let's call it
> VLLB) is this: suppose that we want to match "abcdef...xyz" =~
> /(?<=x+)y/. In theory, to check the possible /x+/ matches in
> the right order [0] we need to check whether there we can match
> 0 characters at offset 0 (no), then check whether we can match
> 1 character at offset 0 or 0 characters from offset 1 (no), then
> check whether we can match 2 characters from offset 0, or 1 from
> 1, or 0 from 2 ... and so it goes on. So we'd be trying a complete
> subpattern, of arbitrary complexity, against O(n^2) strings.

You're assuming that the regex engine will always match things forward,
but the subject of this thread says it should be able to go backward. 
You're interpreting (?<=ab*) to mean "try to match /ab*$/ against what
we've already scanned".  But what if it meant "try to match /^b*a/
against reverse(what we've already scanned)"?

Notice, the pattern is reversed too.  There are probably some gotchas
associated with reversing patterns.  Such as $ matching \n at
end-of-string, while ^ doesn't match \n at the beginning.  I'll be
optimistic and hope these are solvable.

The implementation could do the reverse match by making the regex engine
scan the string in reverse, as was suggested.  That seems like a huge,
and possibly quite inefficient, change to the code, as several people
have pointed out.  Alternatively, it might be possible to actually make
a reversed copy of the string (once, before starting the match at all)
and do the reverse match against that.  This should have negligible
impact on code that doesn't use variable length lookbehind.  It probably
wouldn't play too well with the "incremental pattern matching" proposals
that are being tossed around, though.  Ah, well.

Doing the matching in reverse would have some subtle effects.  For
example, in (?<=([ab]+)([bc]+)), the second greedy match would gobble up
b's before the first one saw them.  I don't see that this is a problem,
as long as this behavior is documented.

What about nesting lookbehinds inside of lookbehinds?  Ouch...

-- 
Robert Mathews
Software Engineer
Excite@Home

Reply via email to