On Apr 20, 2006, at 1:32 PM, Damian Conway wrote:
Keyword Implicit adverbs Behaviour
regex (none) Ignores whitespace, backtracks
token :ratchet Ignores whitespace, no backtracking
rule :ratchet :words Skips whitespace, no backtracking
[...and following threads...]
I'm comfortable with the semantic distinction between 'rule' as "thingy
inside a grammar" and 'regex' as "thingy outside a grammar". But, I
think we can find a better name than 'regex'. The problem is both the
'regex' vs. 'regexp' battle, and the fact that everyone knows 'regex(p)'
means "regular expression" no matter how may times we say it doesn't.
(I'm not fond of the idea of spending the next 20 years explaining that
over and over again.) Maybe 'match' is a better keyword.
Then again, from a practical perspective, it seems likely that we'll
want something like ":ratchet is set by default in all rules" turned on
in some grammars and off in other grammars. In which case, the real
distinction is that rules inside a grammar pull default attributes from
their grammar class, while rules outside a grammar have no default
attributes. Which brings us back to a single keyword 'rule' making sense
I'm not comfortable with the semantic distinction between 'rule' and
'token'. Whitespace skipping is not the defining difference between a
rule and a token in general use of the terms, so the names are misleading.
More importantly, whitespace skipping isn't a very significant option in
grammars in general, so creating two keywords that distinguish between
skipping and no skipping is linguistically infelicitous. It's like
creating two different words for "shirts with horizontal stripes" and
"shirts with vertical stripes". Sure, they're different, but the
difference isn't particularly significant, so it's better expressed by a
modifier on "shirt" than by a different word.
From a practical perspective, both the Perl 6 and Punie grammars have
ended up using 'token' in many places (for things that aren't tokens),
because :words isn't really the semantics you want for parsing computer
languages. (Though it is quite useful for parsing natural language and
other things.) What you want is comment skipping, which isn't the same
I suggest making whitespace skipping a default setting on the grammar
class, so the grammars that need whitespace skipping most of the time
can turn it on by default for their rules. That means 'token' and 'rule'
collapse into just 'rule'.
I also suggest a new modifier for comment skipping (or skipping in
general) that's separate from :words, with semantics much closer to