On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote:
: > Were we using the procedural conjunction:
: > 
: >     "foobar" ~~ / <[a..z]>+ && [ ... ] /;
: > 
: > I would guess that the LHS matches as much as it can ("foobar"), then
: > the RHS matches "foo" [...and then backtracks the LHS until a 
: > conjunctional match is found...]
: >
: > Or it's much simpler than that and both of the regexes above just fail
: > because of the greediness of C<+> and there is no intra-conjunction
: > backtracking.
: 
: I think we definitely allow intra-conjunction backtracking.
: PGE implements it that way.

That's what I think.

: On a somewhat similar question, what happens with a pattern
: such as
: 
:     "foobar" ~~ / foo.+? | fooba /
: 
: The LHS initially matches "foob", but with backtracking could
: eventually match "foobar".  Do the longest-token semantics
: in this case cause the RHS to be dispatched first, even
: though the token declaration of the LHS _could_ match a 
: longer token prefix?  

Yow.  ICATBW.  Non-greedy matching is somewhat antithetical to
longest-token matching.  But basically it boils down to this:
Does the longest-token matcher ignore the ?  and do

    foo.+ | fooba

or is there an implicit ordering above and beyond the DFA engine of

    foob | fooba ||
    fooba | fooba ||
    foobar | fooba ||

I think longest-token semantics have to trump minimal matching here,
and my argument is this.  Most uses of *? have additional information
on what terminates it, either implicitly in what it is matching, or
explicitly in the next bit of regex.  That is, you'd typically see
either

    foo\w+? | fooba

or

    foo.+? <wb> | fooba

In either case, the clear intent is to match foobar over fooba.
Therefore I think the DFA matcher just strips ? and does its ordinary
character by character match, relying on that extra info to match
the real extent of the quantifier.

Larry

Reply via email to