Re: comprehensive list of perl6 rule tokens

2005-06-01 Thread Jeff 'japhy' Pinyan

Further woes, arguments, questions:

In regards to <@array>, A5 says "A leading @ matches like a bare array..." 
but this is an over-generalization.  A leading '@' merely indicates the 
rule is found in an array.  <@array[3]> would be the same as 
<$fourth_element_of_array>, assuming those two values are identical.


Next, about  and .  What is the justification for 
that syntax?  There is no other example of a <-sequence with whitespace, 
at least that I can see.  It would appear "RULE" is an argument of sorts 
to the 'before' and 'after' rules, but how do they access that argument? 
How do I write a rule that takes an argument?


--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


Re: comprehensive list of perl6 rule tokens

2005-05-29 Thread Jeff &#x27;japhy&#x27; Pinyan

On May 26, Patrick R. Michaud said:


  N   backtracking fails completely
			N	remove what matched up to this point from the 
string

 N   we must be after the pattern P
N   we must NOT be after the pattern P
N   we must be before the pattern P
   N   we must NOT be before the pattern P

As with ':words', etc., I'm not sure that these qualify as "tokens"
when parsing the regex -- the tokens are actually "<" or "

I'm curious if  and  "capture" anything.  They don't start 
with '?', so following the guidelines, it would appear they capture, but 
that doesn't make sense.  Should they be written as  and , 
or is the fact that they capture silently ignored because they're not 
consuming anything?


Same thing with  and .  And with  and . 
It should be assumed that  doesn't capture because it can only 
capture if P matches, in which case  fails.


So, what's the deal?

--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


Re: comprehensive list of perl6 rule tokens

2005-05-27 Thread Jeff &#x27;japhy&#x27; Pinyan
In regards to http://www.nntp.perl.org/group/perl.perl6.language/21120 
which discusses character class syntax in Perl 6, I have some comments to 
make.


First, I've been very interested in seeing proper set notation for char 
classes in Perl 5.  I was pretty vocal about it during TPC in 2002, I 
think, and have since added some features that are in Perl 5 now that 
allow you to define your own Unicode properties with not only + and - and 
! but & as well.


If we want to treat character classes as sets, then we should try to use 
notation that reads properly.  I don't see how '+' and '|' are any 
different in this case: <+Foo +Bar> and  should produce the 
same results always.  I suppose the + is helpful in distinguishing a 
character class assertion from any other, though.  To *complement* a 
character class, I think the character ~ is appropriate.  Intersection 
should be done with &.  Subtraction can be provided with -, although it's 
really just a shorthand:  A - B is really A & ~B... but I suppose huffman 
encoding tells us we should provide the - sign.


Here are some examples, then:

  <+alpha -vowels>all alphabetic characters except vowels
  <+alpha & ~vowels>  same thing
  <[a..z] -[aeiou]>   all characters 'a' through 'z' minus vowels
  <[a..z] & ~[aeiou]> same thing
  <~(X & Y) | Z>  all characters not in X-and-Y, or in Z

The last example shows <~ which is currently unclaimed as far as 
assertions go.  Since I'd be advocating the removal of a unary - in 
character classes (to be replaced by ~), I think this would be ok.  The 
allowance for a unary + in character classes has already been justified.


For the people who are really going to use it, the notation won't be 
foreign.  And I'd expect most people who'd use it would actually abstract 
a good portion of it away into their own property definitions, so that


  <~(X & Y) | Z>

would actually just be

  <+My_XYZ_Property>

which would be defined elsewhere.

What say you?

--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


Re: comprehensive list of perl6 rule tokens

2005-05-26 Thread Jeff &#x27;japhy&#x27; Pinyan

On May 26, Patrick R. Michaud said:


On Tue, May 24, 2005 at 08:25:03PM -0400, Jeff 'japhy' Pinyan wrote:

I have looked through the latest
revisions of Apo05 and Syn05 (from Dec 2004) and come up with the
following list:

  http://japhy.perlmonk.org/perl6/rules.txt


I'll review the list below, but it's also worthwhile to read

  http://www.nntp.perl.org/group/perl.perl6.language/21120

which is Larry's latest missive on character classes, and

  http://www.nntp.perl.org/group/perl.perl6.language/20985

which describes the capturing semantics (but be sure to note
the lengthy threads that follow concerning changes in the
indexing from $1, $2, ... to $0, $1, ... ).


I'll check them out.  Right now, I'm really only concerned with syntax 
rather than implementation.  Perl6::Rule::Parser will only parse the rule 
into a tree structure.



&   a&b N   conjunction
&varN   subroutine

I'm not sure that "&var" means subroutine anymore.  A05 does mention


Ok.  If it goes away, I'm fine with that.


x**{n..m}   N   previous atom n..m times

Keeping in mind that the "n..m" can actually be any sort of closure


Yeah, I know.


(   (x) Y   capture 'x'
)   Y   must match opening '('

It may be worth noting that parens not only capture, they also
introduce a new scope for any nested subpattern and subrule captures.


Ok.  I don't think that'll affects me right now.


:ignorecase N   case insensitivity :i
:global N   match globally :g
:continue   N   start scanning after previous match :c
   ...etc

I'm not sure these are "tokens" in the sense of "single unit of purpose"
in your original message.  I think these are all adverbs, and the "token"
is just the initial C<:> at the beginning of a group.


I understand, but that set is particularly important to me, because as far 
as I am concerned, the rule


  /abc/

is the object Perl6::Rule::Parser::exact->new('abc'), whereas the rule

  /:i abc/

is the object Perl6::Rule::Parser::exactf->new('abc') -- this is using 
node terminology from Perl 5, where "exactf" means "exact with case 
folding".



:keepallN   all rules and invoked rules remember everything

That's now  ":parsetree" according to Damian's proposed capture rules.


Ok.  I haven't seen those yet.


  N   backtracking fails completely
 N   remove what matched up to this point from the 
string
 N   we must be after the pattern P
N   we must NOT be after the pattern P
N   we must be before the pattern P
   N   we must NOT be before the pattern P

As with ':words', etc., I'm not sure that these qualify as "tokens"
when parsing the regex -- the tokens are actually "<" or "

I understand.  Luckily this new syntax will enable me to abstract things 
in the parser.


  my $obj = $S->object(assertion => $name, $neg);
  # where $name is the part after the < or Since there's no longer different prefixes for every type of assertion, I 
no longer need to make specific classes of objects.



 N   match whitespace by :w rules
 N   match a space character (chr 32 ONLY)

Here the token is "

Right.


<$rule>   N   indirect rule
<::$rulename> N   indirect symbolic rule
<@rules>  N   like '@rules'
<%rules>  N   like '%rules'
<{ code }>N   code produces a rule
<&foo()>  N   subroutine returns rule
<( code )>N   code must return true or backtracking ensues

Here the leading tokens are actually "<$", "<::$", "<@", "<%", "<{", "<&",
and "<(", and I suspect we have "

Per your second message, <[EMAIL PROTECTED]> would mean >, 
right?


   Of course, one could claim that these are
really separated as in "<", "?", and "$" tokens, but PGE's parser currently
treats them as a unit to make it easier to jump directly into the correct
handler for what follows.


Yes, so does mine. :)


<[a-z]>   N   character class
<+alpha>  N   character class
<-[a-z]>  N   complemented character class

The tokens for character class manipulation are currently "<[", "<+",
and "&

Re: comprehensive list of perl6 rule tokens

2005-05-25 Thread Jeff &#x27;japhy&#x27; Pinyan

On May 25, Mark A. Biggar said:


Jonathan Scott Duff wrote:

On Tue, May 24, 2005 at 11:24:50PM -0400, Jeff 'japhy' Pinyan wrote:

I wish  was allowed.  I don't see why  has to be confined to 
zero-width assertions.


I don't either actually. One thing that occurred to me while responding
to your original email was that  might have slightly wrong
huffmanization.  Is zero-width the common case?  If not, we could use
character doubling for emphasis:   consumes, while  is
zero-width. 


Now  is a character class just like <+digit> and so
under the new character class syntax, would probably be written
<+prop X> or if the white space is a problem, then maybe <+prop:X>
(or <+prop(X)> as Larry gets the colon :-), but that is a pretty
adverbial case so ':' maybe okay) with the complemented case being
<-prop:X>.  Actually the 'prop' may be unnecessary at all, as we know
we're in the character class sub-language because we saw the '<+', '<-'
or '<[', so we could just define the various Unicode character property
codes (I.e., Lu, Ll, Zs, etc) as pre-defined character class names just
like 'digit' or 'letter'.


Yeah, that was going to be my next step, except that the unknowing person 
might make a sub-rule of their own called, say, "Zs", and then which would 
take precedence?  Perhaps  is a good way of writing it.



BTW, as a matter of terminology, <-digit> should probably be called the
complement of <+digit> instead of the negation so as not to confuse it with 
the  negative zero-width assertion case.


Yeah, I just wrote that in my recent reply to Scott.  I realized the 
nomenclature would be a point of confusion.


--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


Re: comprehensive list of perl6 rule tokens

2005-05-25 Thread Jeff &#x27;japhy&#x27; Pinyan

On May 25, Jonathan Scott Duff said:


On Tue, May 24, 2005 at 11:24:50PM -0400, Jeff 'japhy' Pinyan wrote:

I wish  was allowed.  I don't see why  has to be confined
to zero-width assertions.


I don't either actually. One thing that occurred to me while responding
to your original email was that  might have slightly wrong
huffmanization.  Is zero-width the common case?  If not, we could use
character doubling for emphasis:   consumes, while  is
zero-width.


But that's not even the point.  The ! in  is not what makes 
 a zero-width assertion, it's the 'after' that does that.  All 
the ! does is negate the boolean sense of the assertion, which seems like 
a useful thing to have.


Hrm, but I think I see the problem.  How does one define "negation" for an 
arbitrary assertion?  Is  saying "if  matches, fail"?  Because 
then  doesn't make mean the same as <-prop X>.  We don't want 
negation, we want complement.


I guess '!' is only well-defined for zero-width assertions.  When you want 
to say , I guess > or > is the proper way 
to go.


--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


Re: comprehensive list of perl6 rule tokens

2005-05-24 Thread Jeff &#x27;japhy&#x27; Pinyan

On May 24, Jonathan Scott Duff said:


On Tue, May 24, 2005 at 08:25:03PM -0400, Jeff 'japhy' Pinyan wrote:

  http://japhy.perlmonk.org/perl6/rules.txt


That looks completish to me.  (At least I didn't think, "hey! where's
such and such?")


Oh, frabjous day!


One thing that I noticed and had to look up was

<-prop X>

though.  Because ...


I wish  was allowed.  I don't see why  has to be confined 
to zero-width assertions.



The part which needs a bit of clarification right now, in my opinion, is
character classes.  From what I can gather, these are character classes:

  <[a-z] +>
  <+ -[aeiouAEIOU]>


I believe that Larry blessed Pm's idea to allow

<[a..z]+digit>
<+alpha-[aeiouAEIOU]>


Ok, that's news to me.  (I have yet to peruse the archives.)  That's nice, 
not requiring you to <>-ize property names inside a character class 
assertion.  I'd think whitespace would be permitted in between parts of a 
character class, but perhaps I'm wrong.  That would kinda go against the 
whole "whitespace for readability" idea of Perl 6 rules, though.



which implies to me that assertions starting with one of "<[",
"<-" or "<+" should be treated as character classes.  This doesn't
seem to play well with <-prop X>.  Maybe it does though.


Considering the Unicode properties are like char class macro-things (like 
\w and \d), I don't see a problem, except for the fact that there's more 
than one "word" (chunk of non-whitespace) associated with them.  Maybe 
Unicode properties retain their enclosing <>'s?



Also, I think that it's [a..z] now rather than [a-z] but I'm not
entirely sure.  At least that's how PGE implements it.


Ok.  I'll wait for a message from On High about that.  It's a minor 
detail.



but I want to be sure.  I'm also curious about whitespace.  Is "<[" one
token, or can I write "< [a-z] >" and have it be a character class?


I think you need to write "<["


I expected as much.

--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


comprehensive list of perl6 rule tokens

2005-05-24 Thread Jeff &#x27;japhy&#x27; Pinyan
I'm working on a Perl 5 module that will allow for the parsing of a Perl 6 
rule into a tree structure -- specifically, I'm subclassing/extending 
Regexp::Parser into Perl6::Rule::Parser.  This module is designed ONLY to 
PARSE the contents of a rule; it is not concerned with the implementation 
of all the new things Perl 6 rules will offer, merely their syntax.  Once 
this module is done, I'll work on a slightly broader one which will 
concern itself with the exterior of the rule (the m:xyz:abc('def')/.../ 
part, rather than the contents of the rule itself).


To do this effectively, I need an exhaustive list of all tokens that can 
appear in a Perl 6 rule.  By "token", I mean a single unit of purpose, 
such as ^^ and  and **{3..6}.  I have looked through the latest 
revisions of Apo05 and Syn05 (from Dec 2004) and come up with the 
following list:


  http://japhy.perlmonk.org/perl6/rules.txt

The list is split up by leading character.  I think it's complete, but I'm 
probably wrong, which is why I need more eyes to look over it and tell me 
what I've missed.


I just got an email back from Damian which will help me move in the right 
direction, but I'd like this to be open to as many knowledgeable minds as 
possible.


The part which needs a bit of clarification right now, in my opinion, is 
character classes.  From what I can gather, these are character classes:


  <[a-z] +>
  <+ -[aeiouAEIOU]>

but I want to be sure.  I'm also curious about whitespace.  Is "<[" one 
token, or can I write "< [a-z] >" and have it be a character class?


Thanks for your help.  Unless you're difficult.

--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart


explicit laws about whitespace in rules

2005-05-23 Thread Jeff &#x27;japhy&#x27; Pinyan
I'd like to know where EXACTLY whitespace is permitted in rules.  Is it 
legal to write


  \c [CHARACTER NAME]

or must I write

  \c[CHARACTER NAME]

--
Jeff "japhy" Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart