Re: grammar: difference between rule, token and regex

2006-06-02 Thread jerry gay

On 6/2/06, Rene Hangstrup Møller [EMAIL PROTECTED] wrote:

Hi

I am toying around with Parrot and the compiler tools. The documenation
of Perl 6 grammars that I have been able to find only describe rule. But
the grammars in Parrot 0.4.4 for punie and APL use rule, token and regex
elements.

Can someone please clarify the difference between these three types, and
when you should use one or the other?


i'm forwarding this to p6l, as it's a language question and probably
best asked there. that said, the regex/token/rule change is a recent
one, and is documented in S05
(http://dev.perl.org/perl6/doc/design/syn/S05.html)

in particular, see the Regexes really are regexes now section, which
describes the differences. also, there are some recent threads on p6l
with regard to this topic, which you may find enlightening. you can
find these via google groups, or some other nntp archive.
~jerry


Re: grammar: difference between rule, token and regex

2006-06-02 Thread Patrick R. Michaud
On Fri, Jun 02, 2006 at 01:56:55PM -0700, jerry gay wrote:
 On 6/2/06, Rene Hangstrup Møller [EMAIL PROTECTED] wrote:
 I am toying around with Parrot and the compiler tools. The documenation
 of Perl 6 grammars that I have been able to find only describe rule. But
 the grammars in Parrot 0.4.4 for punie and APL use rule, token and regex
 elements.
 
 Can someone please clarify the difference between these three types, and
 when you should use one or the other?

 i'm forwarding this to p6l, as it's a language question and probably
 best asked there. that said, the regex/token/rule change is a recent
 one, and is documented in S05
 (http://dev.perl.org/perl6/doc/design/syn/S05.html)

Jerry is correct that S05 is the place to look for information
on this.  But to summarize an answer to your question:

   - a Cregex is a normal regular expression

   - a Ctoken is a regex with the :ratchet modifier set.  The
 :ratchet modifier disables backtracking by default, so that
 a plain quantifier such as '*' or '+' will greedily match whatever
 it can but won't backtrack if the remainder of the match fails.

   - a Crule is a regex with both the :ratchet and :sigspace
 modifiers set.  The :sigspace modifier indicates that whitespace
 in the rule should be replaced by a intertoken separator rule
 such as ?ws (a whitespace matching rule).

So,

rule { a* c b+ }

is the same as

token { ?ws a* ?ws c ?ws b+ ?ws }

is the same as

regex { ?ws: a*: ?ws: c ?ws: b+: ?ws }


To answer your other question, about when to use each, here are
some rules of thumb (sorry for the pun):

  - If the quantifiers in the rule need to do backtracking, use 'regex'

  - If backtracking isn't needed, use 'token'

  - If the components of the regex can have intertoken separators
between them, use rule (and perhaps define a custom ws rule
that matches the language's idea of intertoken separator).

Here's a quick contrived example to illustrate the difference:

token identifier { alpha \w* }

token integer { \d+ }

token value { identifier | integer }

token operator { \+ | - | \* | / }

rule expression { value [ operator value ]* }

rule assignment { identifier \:= expression }

The token declarations all define regexes that do not match
any whitespace.  Thus,  abc is a valid identifier butabc 
is not.

The rule declarations, however, allow for whitespace to occur
between each of the elements.  Thus, each of the following
are valid assignments in the above language, as the use of
rule tells us where whitespace is allowed in the match:

 b:=3+a*4
 b := 3 + a * 4
 b   :=3   +a*   4

I can come up with more examples if desired, but that's the basics
behind each.

Hope this helps,

Pm


Re: grammar: difference between rule, token and regex

2006-06-02 Thread Rene Hangstrup Møller

Patrick R. Michaud wrote:

Jerry is correct that S05 is the place to look for information
on this.  But to summarize an answer to your question:
  
Thank you very much for the swift and thorough answer. It answered all 
my questions. Your reply was very pedagogical and deserves to go into 
the manual.


Have a nice weekend
/Rene Hangstrup Møller