On May 26, Patrick R. Michaud said:

On Tue, May 24, 2005 at 08:25:03PM -0400, Jeff 'japhy' Pinyan wrote:
I have looked through the latest
revisions of Apo05 and Syn05 (from Dec 2004) and come up with the
following list:


I'll review the list below, but it's also worthwhile to read


which is Larry's latest missive on character classes, and


which describes the capturing semantics (but be sure to note
the lengthy threads that follow concerning changes in the
indexing from $1, $2, ... to $0, $1, ... ).

I'll check them out. Right now, I'm really only concerned with syntax rather than implementation. Perl6::Rule::Parser will only parse the rule into a tree structure.

        &   a&b         N       conjunction
                &var                N       subroutine

I'm not sure that "&var" means subroutine anymore.  A05 does mention

Ok.  If it goes away, I'm fine with that.

        x**{n..m}       N       previous atom n..m times

Keeping in mind that the "n..m" can actually be any sort of closure

Yeah, I know.

        (       (x)             Y       capture 'x'
        )                       Y       must match opening '('

It may be worth noting that parens not only capture, they also
introduce a new scope for any nested subpattern and subrule captures.

Ok.  I don't think that'll affects me right now.

        :ignorecase     N       case insensitivity :i
        :global         N       match globally :g
        :continue       N       start scanning after previous match :c

I'm not sure these are "tokens" in the sense of "single unit of purpose"
in your original message.  I think these are all adverbs, and the "token"
is just the initial C<:> at the beginning of a group.

I understand, but that set is particularly important to me, because as far as I am concerned, the rule


is the object Perl6::Rule::Parser::exact->new('abc'), whereas the rule

  /:i abc/

is the object Perl6::Rule::Parser::exactf->new('abc') -- this is using node terminology from Perl 5, where "exactf" means "exact with case folding".

        :keepall        N       all rules and invoked rules remember everything

That's now  ":parsetree" according to Damian's proposed capture rules.

Ok.  I haven't seen those yet.

        <commit>  N       backtracking fails completely
        <cut>             N       remove what matched up to this point from the 
        <after P> N       we must be after the pattern P
        <!after P>        N       we must NOT be after the pattern P
        <before P>        N       we must be before the pattern P
        <!before P>       N       we must NOT be before the pattern P

As with ':words', etc., I'm not sure that these qualify as "tokens"
when parsing the regex -- the tokens are actually "<" or "<!" and

I understand. Luckily this new syntax will enable me to abstract things in the parser.

  my $obj = $S->object(assertion => $name, $neg);
  # where $name is the part after the < or <!
  # and $neg is a boolean denoting the presence of !

Since there's no longer different prefixes for every type of assertion, I no longer need to make specific classes of objects.

       <?ws>              N       match whitespace by :w rules
        <?sp>             N       match a space character (chr 32 ONLY)

Here the token is "<?", indicating a non-capturing subrule.


        <$rule>           N       indirect rule
        <::$rulename>     N       indirect symbolic rule
        <@rules>  N       like '@rules'
        <%rules>  N       like '%rules'
        <{ code }>        N       code produces a rule
        <&foo()>      N       subroutine returns rule
        <( code )>        N       code must return true or backtracking ensues

Here the leading tokens are actually "<$", "<::$", "<@", "<%", "<{", "<&",
and "<(", and I suspect we have "<?$", "<?::$", "<?@", and "<!$", "<!::$",
"<!@", etc. counterparts.

Per your second message, <[EMAIL PROTECTED]> would mean <!before <@rules>>, 

                           Of course, one could claim that these are
really separated as in "<", "?", and "$" tokens, but PGE's parser currently
treats them as a unit to make it easier to jump directly into the correct
handler for what follows.

Yes, so does mine. :)

        <[a-z]>           N       character class
        <+alpha>  N       character class
        <-[a-z]>  N       complemented character class

The tokens for character class manipulation are currently "<[", "<+",
and "<-", although that's not officially documented in A05 or S05 yet.
Also, ranges are now <[a..z]> -- an unescaped hyphen appearing in an
enumerated character class generates a warning.

        <+\w-[0-9]>       N       character class "arithmetic"

I'm not sure that it's been decided/documented that \w, \s, etc.
can appear in character class arithmetic (although it seems like it

The new character class idiom is going to confuse me for a while. I'll have to read the above URL in which Larry sheds light.

        <prop:X>  N       Unicode property match
        <-prop:X> N       complemented Unicode property match

Here "prop" is just a subrule (or character class) similar to
<+alpha>, <+digit>, etc.  Also, note that <prop:X> is a capturing
subrule, while <+prop:X> would be a character class match (and presumably
not capture).

I think I'll wait to handle Unicode properties until a syntax has been agreed upon... <prop:X>, <X>, <prop(X)>, etc.

        <rule>            N       match rule (and capture to $rule)
        <?rule>           N       match rule (don't capture)
        <<rule>>    N       match rule (don't capture)

Do we still have the <<rule>> syntax, or was that abandoned in
favor of <?rule> ?  (I know there are still some remnants of <<...>>
in S05 and A05, but I'm not sure they're intentional.)

I saw <<...>> in A/S 05, but if they're accidental, then I just won't deal with it.

And, what's the deal with <RULE> capturing? Does that mean I have to write <?digit> everywhere instead of <digit> unless I want a capture? Eh, I guess \d exists for that reason...

Thanks for your help.  Unless you're difficult.

   "You're welcome"  unless $Pm ~~ /<?difficult>/;

Difficulty nonexistent.

