Re: A rule by any other name...

Allison Randal Wed, 10 May 2006 03:43:36 -0700

On Wed, 10 May 2006, Damian Conway wrote:
> Allison wrote:
> 
> I've never met anyone who *voluntarily* added
> the 'p'. ;-)


You've spent too much time in the U.S. ;)

> >  and the fact that everyone knows 'regex(p)'
> > means "regular expression" no matter how may times we say it doesn't.
> 
> Sure. But almost nobody knows what "regular" actually means, and of
> those few only a tiny number of pedants actually *care* anymore. So
> does it matter?

Picking names that mean what they say is important in Perl. It's why we have
'given'/'when' instead of 'switch'/'case'. We don't have to use the same old
name for things just because everyone else is doing it (even if we started it).

There's nothing about 'regex' that says "backtracking enabled".

> Then don't. I teach regexes all the time and I *never* explain what
> "regular" means, or why it doesn't apply to Perl (or any other
> commonly used) regexes any more.

But isn't it appealing to stop using an archaic word that has now become
meaningless?

> > Maybe 'match' is a better keyword.
> 
> I don't think so. "Match" is a better word for what comes back from
> a regex match (what we currently refer to as a Capture, which is
> okay too).

I agree there. I still prefer 'rule'.

> That's pretty much the Perl 5 argument for using "sub" for both subroutines
> and methods, which we've definitively rejected in Perl 6.

Subs and methods have a number of distinguising characteristics. If the only
distinction between them was one small characteristic change, I might argue
against using different keywords there too. (I think the choice of using only
'sub' made sense for Perl 5 with its simplistic OO semantics, but Perl 6
provides more intelligent defaults for methods so the separation makes sense
here.)

Rules inside and outside grammars are the same class. They have the same
behaviour aside from :ratchet, and :ratchet can be set without the keyword
change. More than that, the current 'rule' and 'regex' can both be used inside
and outside a grammar. If we were to take the 'sub'/'method' pattern, then
'rule' should never be allowed outside a grammar, and 'regex' should either not
be allowed inside a 'grammar', or should express some distinctive feature
inside the grammar (like "non-inherited" or "doesn't operate on the match
object", but there are better words for those concepts than 'regex').

> If we use "rule" for both kinds of regexes, we force the reader to constantly
> check surrounding context in order to understand the behaviour of the
> construct. :-(

Context is a Perlish concept. :)

It's worse to force the writer and reader to distinguish between two keywords
when they don't have a sharp difference in meaning, and when the names of the
two keywords don't provide any clues to what the difference is.

Making different things different is an important design principle, but so is
making similar things similar.

> True. "Token" is the wrong word for another reason: a token is a
> segments component of the input stream, *not* a rule for matching
> segmented components of the input stream. The correct term for that is
> "terminal". So a suitable keyword might well be "term".

I do like 'term' better.

> Whitespace skipping (for suitable values of "whitespace") is a critical
> feature of parsers. I'd go so far as to say that it's *the* killer feature of
> Parse::RecDescent.
>
> What you want is *whitespace* skipping (where comments are a special form of
> whitespace). What you *really* want is is whitespace skipping where you get
> to define what constitutes whitespace in each context where whitespace might
> be skipped.

That really isn't "whitespace" skipping, though. Calling it whitespace skipping
conflates two concepts that are only slightly related. I agree that skipping is
an important feature in parsers.

> But the defining characteristic of a "terminal" is that you try to match
> it exactly, without being smart about what to ignore. That's why I like the
> fundamental rule/token distinction as it is currently specified.

Can you give me some additional characteristics for 'term' beyond just "turn
off :skip"? Grammars also need to turn off skipping in rules that aren't
terminals, and the different keyword is entirely inappropriate in those cases.
Since you'd need to use ':!skip' (or whatever syntax) on other rules anyway, it
doesn't make sense to use 'term' anywhere unless it provides some additional
intelligent defaults for terminals.

> > I also suggest a new modifier for comment skipping (or skipping in
> > general) that's separate from :words, with semantics much closer to
> > Parse::RecDescent's 'skip'.
> 
> Note, however, that the recursive nature of Parse::RecDescent's <skip>
> directive is a profound nuisance in practice, because you have to
> remember to turn it off in every one of the terminals.

And in the current form you have to remember to use 'token' for all the
terminals. Not really a significant difference in mental effort.

> In light of all that, perhaps :words could become :skip, which defaults to
> :skip(/<ws>/) but allows you to specify :skip(/whatever/).

Including :skip(/<someotherrule>/). Yes, agreed, it's a huge improvement. I'd
be more comfortable if the default rule to use for skipping was named <skip>
instead of <ws>. (On IRC <sep> was also proposed, but the connection between
:skip and <skip> is more immediately obvious.)

> As for the keywords and behaviour, I think the right set is:
> 
>                                     Default           Default
>      Keyword        Where         Backtracking        Skipping
> 
>       regex         anywhere       :!ratchet          :!skip
>        rule         grammars       :ratchet           :skip
>        term         grammars       :ratchet           :!skip

And I think the right set is:

         rule         anywhere       :!ratchet          :!skip
         rule         grammars       :ratchet           :!skip

(Assuming that the universal base grammar class has :ratchet set, and anyone
can unset it with :!ratchet on their grammar or on individual rules. Also
assuming that we make it easy to turn on :skip for a grammar.)

> I do agree that a rule should inherit properties from its grammar, so
> you can write:
> 
>     grammar Perl6 is skip(/[<ws>+ | \# <brackets> | \# \N]+/) {
>         ...
>     }
> 
> to allow your grammar to redefine in one place what its rules skip.

To quote a friend: Yay! :)

Allison

Re: A rule by any other name...

Reply via email to