Re: pattern alternation (was Re: How are ...)

Aaron Sherman Thu, 05 Aug 2010 10:28:53 -0700

On Thu, Aug 5, 2010 at 11:09 AM, Patrick R. Michaud <pmich...@pobox.com>wrote:


> On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote:
> > On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak <cma...@gmail.com> wrote:
> > > I see this particular thinko a lot, though. Maybe some Perl 6 lint
> > > tool or another will detect when you have a regex containing ^ at its
> > > start, $ at the end, | somewhere in the middle, and no [] to
> > > disambiguate.
> >
> > You know, this problem would go away, almost entirely, if we had a
> :f[ull]
> > adverb for regex matching that imposed ^[...]$ around the entire match.
> Then
> > your code becomes:
> >
> >   m:f/<[A..Z]>+|<[a..z]>+/
>
> There's a version of this already.  Matching against an explicit 'regex',
> 'token', or 'rule' automatically anchors it on both ends.  Thus:
>
>    $string ~~ regex { <[A..Z]>+ | <[a..z]>+ }
>
> is equivalent to
>
>    $string ~~ regex { ^ [ <A..Z>+ | <[a..z]>+ ] $ }
>
>
While that's a nifty special case (I'm sure it will surprise me someday, and
I'll spend a half hour debugging before I remember this mail), it doesn't
help in the general case (see my example grammar, below).

After doing some more thinking and comparing this to other languages
(python, for example has "match" which matches only at the start of a
string), it seems to me that there is a sort of out-of-band need to have a
more general solution at match time. Here's my second pass suggestion:

 m:r / m:rooted -- Match is rooted on both ends ("^...$")
 m:rs / m:rootedstart - Match is rooted at the start of string ("^", ala
Python re.match)
 m:re / m:rootedend - Match is rooted at the end of string ("$")
 m:rn / m:rootednone - Match is not rooted (default)
 m:o / m:oneline - Modify :r and friends to use ^^/$$

Here's one way I can see that being routinely used:

 # Simplistic shell scripts
 rule TOP :r {<stmt>*} # Match the whole script
 rule stmt :r :o { <cmd> <arg>* } # One statement per line

The other way to go about that would be with parameterized adverbs. I'm not
sure how comfy people are with those, but they're in the spec. So this:

 m:r / m:rooted -- Match is rooted (default is ^...$)
    Parameters:
    :s / :start -- Match is rooted only at start ("^")
    :e / :end -- Match is rooted only at end ("$")
    [note: :s :e should produce a warning]
    :n / :none -- Match is not rooted (null modifier)
    [note: combining :n with :s or :e should warn]
    :o / :oneline -- Use ^^ and $$ instead of ^ and $
    [note: combining :o with :n should warn?]

So our statement matching grammar becomes:

 rule TOP :r {<stmt>*}
 rule stmt :r(:o) { <cmd> <arg>* }

The clown nose is just a side benefit ;-)

Seriously, though, I prefer :r(:o) because :r:o looks like it should be the
opposite of :rw (there is no :ro, as far as I know).

PS: I see no reason that any of this is needed for 6.0.0

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Re: pattern alternation (was Re: How are ...)

Reply via email to