On Sun, May 29, 2005 at 12:52:25PM -0400, Jeff 'japhy' Pinyan wrote:
> I'm curious if <commit> and <cut> "capture" anything. They don't start
> with '?', so following the guidelines, it would appear they capture, but
> that doesn't make sense. Should they be written as <?commit> and <?cut>,
> or is the fact that they capture silently ignored because they're not
> consuming anything?
>
> Same thing with <null> and <prior>. And with <after P> and <before P>.
> It should be assumed that <!after P> doesn't capture because it can only
> capture if P matches, in which case <!after P> fails.
>
> So, what's the deal?
I'm not the language designer, but FWIW here is my interpretation.
First, we have to remember that "capture" now means more than just
grabbing characters from a string -- it also generates a successful
match and a corresponding match object. Thus, even though <after>,
<before>, <commit>, <cut>, and <null> are zero width assertions,
maybe they should still produce a corresponding match object
indicating a successful match. This might end up being useful in
alternations or other rule structures:
m/ [ abc <commit> def | ab ]/ ;
if $<commit> { say "we found 'abcdef'"; }
m/ [ abc | def <null> ]/;
if $<null> { say "we found 'def'"; }
I don't *know* that this would be useful, and certainly there are
other ways to achieve the same results, but keeping the same
capture semantics for zero-length assertions seems to work
out okay. Of course, to avoid the generation of the match objects
one can use <?commit>, <?cut>, <?null>, etc. I suspect that for the
majority of cases the choice of <commit> vs. <?commit> isn't going to
make a whole lot of difference, and for the places where it does make
a difference it's nice to preserve the interpretation being used by
other subrules.
Things could be a bit interesting from a performance/optimization
perspective; conceivably an optimizer could do a lot better for the
common case if we somehow declared that <null>, <commit>, <cut>, etc.
never capture. But I think the execution cost of capturing vs.
non-capturing in PGE is minimal relative to other considerations,
so we're a bit premature to try to optimize there. Overall I think
we'll be better off keeping things consistent for programmers at
the language level, and then build better/smarter optimizers into
the pattern matching engine to handle the common cases.
Pm