Re: some newbie questions about synopsis 5

2006-02-20 Thread Damian Conway

Patrick clarified:


At any rate, I find that having a subpattern capture base its
index on the highest index of all of the previous alternation
branches is easy to understand and works well in practice.  It can
also be easily changed with another alias if needed.


I strongly agree, and would be unhappy to see it work any other way.



* If a subrule appears two (or more) times in the same lexical scope
 (i.e. twice within the same subpattern and alternation), or if the
 subrule is quantified anywhere within the entire rule, then its
 corresponding hash entry is always assigned a reference to an array
 of Match objects, rather than a single Match object.

Maybe you're not the right person to ask, but is there a particular
reason for the entire rule bit?

/ (foo|None) foo (foo) /

Here we get three Matches $0foo (possibly undefined), $foo, and
$1foo. At least, I think so.

/ (foo?) foo (foo) /

Now, we suddenly get three more or less unrelated arrays with lengths
1..1, 1, and 1. Of course, I admit this example is a bit artificial.



Oh, I hadn't caught that particular clause (or hadn't read it as
you just did).  PGE certainly doesn't implement things that way.
I think the entire rule clause was intended to cover cases like

/ [ foo ]* /

where foo is indirectly quantified and therefore is an array of
match objects.  We should probably reword it, or get a clarification
of what is intended.  (Damian, @Larry:  can you confirm or clarify
this for us?)


Sorry, you're correct that it's not what was intended. I was specifically 
trying to address the case where the same subrule appears with different 
quantifications in different alternations in the same scope.


That is, the difference between:

m/ bar foo | baz foo / # $foo always contains a scalar

and:

m/ bar foo | baz foo* /# $foo always contains an array ref


Is this clearer:

* If a subrule appears two (or more) times in any branch of a
  lexical scope (i.e. twice within the same subpattern and
  alternation), or if the subrule is quantified anywhere within a
  given scope, then its corresponding hash entry is always assigned
  a reference to an array of Match objects, rather than a single
  Match object.


???

If so, I'd be happy if someone wanted to update the Synposis that way.

Note, however, that this question suggests that we need a more overt statement 
about what consistitutes a scope within a regex. I'll work on providing that 
when I take my next pass through the Synopses (probably next week).





Furthermore, I think within the same subpattern and alternation is
not quite correct, at least it wouldn't apply to somethink like

/ (foo [ foo | ... ]) /

unless we consider the (...) sequence as a kind of single branch
alternation. And why are alternation branches considered to be
lexical scopes, anyway? 


In the example you give, $0foo is indeed an array of match objects.
The same alternation in this case is the subpattern... compare to

   / (foo [ foo | ... ]) | foo /

$0foo is an array, $foo is a single match object.

Alternation branches don't create new lexical scopes, they just
affect quantification and subpattern numbering.  In both of the 
following examples


/ abc foo def foo /

/ ghi foo | jkl foo /

each foo has the same lexical scope ($foo), but in the abc
example $foo is an array of match objects, while in the ghi
example $foo is a single match object.


Patrick is spot-on here.

In simplest terms, the only things that create a scope are the regex 
delimiters (which delimit the outermost lexical scope), and any pair of 
capturing parentheses (which delimit some nested scope).




My second question is why adding a ? or ?? to an unquantified
subrule which would otherwise result in a single Match object should
result in an array, rather than a single (possibly undefined) Match.


The specification was originally this way but was later changed
to the current definition.  I think people found the idea of
? producing a single match object confusing, so for consistency
we ended up with all quantifiers produces arrays of match objects.


That's my recollection too. And I certainly agree with the decision, even 
though I proposed it the other way originally.


Damian


Re: some newbie questions about synopsis 5

2006-02-17 Thread H. Stelling

Patrick R. Michaud wrote:


In the following,

/ (a) [ (b) (c) | $5 := (d) $0 := (e) ] (f) /

does the first alias have any effect on where the f's will go
(probably not)?
   



I'll defer to @Larry on this one, but my initial impression is
that the (f) capture would go into $6.


I think that sequences should behave exactly as single branch
alternations (only that there is no such thing, although we
can write [foo|fail]). So I would rather opt for $1.


- Which rules do apply to repeated captures with the same alias? For
example,
the second array aliasing example

m:w/ Mr?s? @names := ident W\. @names := ident
  | Mr?s? @names := ident
  /;

seems to suggests that by using $names, the lower branch would have
resulted in a single Match object instead of an array (like the array we
would have gotten if we hadn't used the aliases in the first place). Is
this right? 
   



Yes, that's correct.


But wouldn't it be nice if the same rules applied to aliases and
subrule invocations, that is, recursion put aside, to think of

/ foo /

simply as a shorter way to say

/ $foo := ([definition of foo]:) /?

And I've got two more somewhat related questions:

The synopsis says:

* If a subrule appears two (or more) times in the same lexical scope
  (i.e. twice within the same subpattern and alternation), or if the
  subrule is quantified anywhere within the entire rule, then its
  corresponding hash entry is always assigned a reference to an array
  of Match objects, rather than a single Match object.

Maybe you're not the right person to ask, but is there a particular
reason for the entire rule bit?

/ (foo|None) foo (foo) /

Here we get three Matches $0foo (possibly undefined), $foo, and
$1foo. At least, I think so.

/ (foo?) foo (foo) /

Now, we suddenly get three more or less unrelated arrays with lengths
0..1, 1, and 1. Of course, I admit this example is a bit artificial.

Furthermore, I think within the same subpattern and alternation is
not quite correct, at least it wouldn't apply to somethink like

/ (foo [ foo | ... ]) /

unless we consider the (...) sequence as a kind of single branch
alternation. And why are alternation branches considered to be
lexical scopes, anyway? Just because of subpattern numbering?

My second question is why adding a ? or ?? to an unquantified
subrule which would otherwise result in a single Match object should
result in an array, rather than a single (possibly undefined) Match.
That is, why doesn't foo? rather behave like [foo|null]?
This would save us the trouble to create all these tiny arrays, or
having to write [...|null] all the time. Or maybe one could
define one's own quantifiers?






Re: some newbie questions about synopsis 5

2006-02-17 Thread Patrick R. Michaud
On Fri, Feb 17, 2006 at 02:33:12PM +0100, H. Stelling wrote:
 Patrick R. Michaud wrote:
 In the following,
 
 / (a) [ (b) (c) | $5 := (d) $0 := (e) ] (f) /
 
 does the first alias have any effect on where the f's will go
 (probably not)?
 
 I'll defer to @Larry on this one, but my initial impression is
 that the (f) capture would go into $6.
 
 I think that sequences should behave exactly as single branch
 alternations (only that there is no such thing, although we
 can write [foo|fail]). So I would rather opt for $1.

The current implementation is that a capturing subpattern
is indexed based on the largest index in all of the alternation
branches.  I'm not sure it makes sense to base it on aliases of 
the last alternation branch.  

Here are some examples we can chew on:

/ (a) [ (b) (c) | (d) ] (f) / # (f) is $3 or $2?  (currently $3)

/ (a) [ (b) (c) | $1 := (d) ] (f) /   # (f) is $3 or $2?

Since the second example is essentially saying the same as the first,
the (f) capture ought to go to the same place in each case.  If we
say that the existence of the $1 causes the (f) to go into $2, it
also becomes the case that $2 is an array of match objects, which
isn't technically problematic but it might be a bit surprising for
many.

Some other examples to consider:

/ (a) [ (b) (c) | $0 := (d) ] (f) /   # (f) is $3 or $1?  

/ (a) [ (b) (c) | $0 := (d) (3) ] (f) /   # (f) is $3 or $2? 

At any rate, I find that having a subpattern capture base its
index on the highest index of all of the previous alternation
branches is easy to understand and works well in practice.  It can
also be easily changed with another alias if needed.

 But wouldn't it be nice if the same rules applied to aliases and
 subrule invocations, that is, recursion put aside, to think of
 
 / foo /
 
 simply as a shorter way to say
 
 / $foo := ([definition of foo]:) /?

First, is that colon following [definition of foo] intentional or
a typo?  Currently we can backtrack into subrules -- there's no cut
assumed after them.

But secondly, I'm not sure we can casually toss recursion
aside when thinking about this, since it's really a driving force 
behind having named subrules.  :-)  There's also a difference in
that subrules can take arguments, as in foo('args'), or can come
from another grammar, as in Rule::foo, which seems to argue that 
foo is really something other than an alias shorthand.

 The synopsis says:
 
 * If a subrule appears two (or more) times in the same lexical scope
   (i.e. twice within the same subpattern and alternation), or if the
   subrule is quantified anywhere within the entire rule, then its
   corresponding hash entry is always assigned a reference to an array
   of Match objects, rather than a single Match object.
 
 Maybe you're not the right person to ask, but is there a particular
 reason for the entire rule bit?
 
 / (foo|None) foo (foo) /
 
 Here we get three Matches $0foo (possibly undefined), $foo, and
 $1foo. At least, I think so.
 
 / (foo?) foo (foo) /
 
 Now, we suddenly get three more or less unrelated arrays with lengths
 1..1, 1, and 1. Of course, I admit this example is a bit artificial.

Oh, I hadn't caught that particular clause (or hadn't read it as
you just did).  PGE certainly doesn't implement things that way.
I think the entire rule clause was intended to cover cases like

/ [ foo ]* /

where foo is indirectly quantified and therefore is an array of
match objects.  We should probably reword it, or get a clarification
of what is intended.  (Damian, @Larry:  can you confirm or clarify
this for us?)

 Furthermore, I think within the same subpattern and alternation is
 not quite correct, at least it wouldn't apply to somethink like
 
 / (foo [ foo | ... ]) /

 unless we consider the (...) sequence as a kind of single branch
 alternation. And why are alternation branches considered to be
 lexical scopes, anyway? 

In the example you give, $0foo is indeed an array of match objects.
The same alternation in this case is the subpattern... compare to

   / (foo [ foo | ... ]) | foo /

$0foo is an array, $foo is a single match object.

Alternation branches don't create new lexical scopes, they just
affect quantification and subpattern numbering.  In both of the 
following examples

/ abc foo def foo /

/ ghi foo | jkl foo /

each foo has the same lexical scope ($foo), but in the abc
example $foo is an array of match objects, while in the ghi
example $foo is a single match object.

 My second question is why adding a ? or ?? to an unquantified
 subrule which would otherwise result in a single Match object should
 result in an array, rather than a single (possibly undefined) Match.

The specification was originally this way but was later changed
to the current definition.  I think people found the idea of
? producing a single match object confusing, so for consistency
we ended up with all quantifiers produces arrays of match objects.

(Note also that even if ? produced 

Re: some newbie questions about synopsis 5

2006-02-17 Thread Larry Wall
On Fri, Feb 17, 2006 at 08:32:18AM -0600, Patrick R. Michaud wrote:
:  The synopsis says:
:  
:  * If a subrule appears two (or more) times in the same lexical scope
:(i.e. twice within the same subpattern and alternation), or if the
:subrule is quantified anywhere within the entire rule, then its
:corresponding hash entry is always assigned a reference to an array
:of Match objects, rather than a single Match object.
:  
:  Maybe you're not the right person to ask, but is there a particular
:  reason for the entire rule bit?
:  
:  / (foo|None) foo (foo) /
:  
:  Here we get three Matches $0foo (possibly undefined), $foo, and
:  $1foo. At least, I think so.
:  
:  / (foo?) foo (foo) /
:  
:  Now, we suddenly get three more or less unrelated arrays with lengths
:  1..1, 1, and 1. Of course, I admit this example is a bit artificial.
: 
: Oh, I hadn't caught that particular clause (or hadn't read it as
: you just did).  PGE certainly doesn't implement things that way.
: I think the entire rule clause was intended to cover cases like
: 
: / [ foo ]* /
: 
: where foo is indirectly quantified and therefore is an array of
: match objects.  We should probably reword it, or get a clarification
: of what is intended.  (Damian, @Larry:  can you confirm or clarify
: this for us?)

I believe that was the intent, but I'll defer to Damian on the wordsmithing
because I'm a bit out of sorts at the moment and it'd probably come out
all sideways.

Larry


some newbie questions about synopsis 5

2006-02-15 Thread H. Stelling

Hello,

I've stumbled upon Perl6 a couple of weeks ago and I'm really looking
forward
to seeing the finished product. Currently, I'm trying to implement a
perl-like
rules module for Python, and I've got some questions which I think aren't
covered in the Synopsis or anywhere else I looked, mostly concerning
captures
and aliases:

- Capture numbering:

/(a) [ (b) (c) (d) | (e)  (f) ] (g)/ capture.t suggests something like
 $0$1  $2  $3$1$2$4,  but I'm only guessing about the
 bit.

In the following,

/ (a) [ (b) (c) | $5 := (d) $0 := (e) ] (f) /

does the first alias have any effect on where the f's will go
(probably not)?

- Which rules do apply to repeated captures with the same alias? For
example,
the second array aliasing example

m:w/ Mr?s? @names := ident W\. @names := ident
   | Mr?s? @names := ident
   /;

seems to suggests that by using $names, the lower branch would have
resulted in a single Match object instead of an array (like the array we
would have gotten if we hadn't used the aliases in the first place). Is
this right? And could the same effect have been achieved by something
like

/ $names := indent**{1} / ?

- More array aliasing:

is  / mv  @files := [...]*  /
just (slightly) shorter for / mv [$files := [...]]* / ?

Likewise, could/   @pairs := ( (\w+) \: (\N+) )+ /
have also been written / [ $pairs :=   (\w+) \: $pairs := (\N+) ]+ / ?

- Array and hash aliasing of quantified subpatterns or subrules: what
happens
to the named captures?

/ @foo := ( ... $bar := (...) ... )* /

And if the subpattern or subrule ends with an alternation, can the
number of
array elements to be appended (or hashed) vary depending on whitch
branch is
taken?

- Which of the following constructs could possibly be ok (I hope, none)?

/ $foo := ...  $foo := ... /
/ $foo := ...   %foo := ... /
/ $foo := ... | %foo := ... /
/ $foo := $foo := ... /

- Do aliases bind right-to-left, as do assignments?

/ $2 := $5 := ... /   # next should be $3, not $6

- Which kind of escape sequences are allowed (or required) in enumerated
character classes?

Thanks in advance for any answers!






Re: some newbie questions about synopsis 5

2006-02-15 Thread Patrick R. Michaud
On Wed, Feb 15, 2006 at 10:09:05AM +0100, H. Stelling wrote:
 - Capture numbering:
 
 /(a) [ (b) (c) (d) | (e)  (f) ] (g)/ capture.t suggests something like
  $0$1  $2  $3$1$2$4,  but I'm only guessing about the
  bit.

Yes.


 In the following,
 
 / (a) [ (b) (c) | $5 := (d) $0 := (e) ] (f) /
 
 does the first alias have any effect on where the f's will go
 (probably not)?

I'll defer to @Larry on this one, but my initial impression is
that the (f) capture would go into $6.

 - Which rules do apply to repeated captures with the same alias? For
 example,
 the second array aliasing example
 
 m:w/ Mr?s? @names := ident W\. @names := ident
| Mr?s? @names := ident
/;
 
 seems to suggests that by using $names, the lower branch would have
 resulted in a single Match object instead of an array (like the array we
 would have gotten if we hadn't used the aliases in the first place). Is
 this right? 

Yes, that's correct.

 And could the same effect have been achieved by something
 like
 
 / $names := indent**{1} / ?

Yes, a quantified capturing subrule or subpattern results in an
array of Match objects (even if the quantification is 1).

 - More array aliasing:
 
 is  / mv  @files := [...]*  /
 just (slightly) shorter for / mv [$files := [...]]* / ?

I think so.

 Likewise, could/   @pairs := ( (\w+) \: (\N+) )+ /
 have also been written / [ $pairs :=   (\w+) \: $pairs := (\N+) ]+ / ?

Seems like it would work.

 - Array and hash aliasing of quantified subpatterns or subrules: what
 happens
 to the named captures?
 
 / @foo := ( ... $bar := (...) ... )* /

Presuming you meant $bar there instead of $bar, I have no idea
what would happen.  (With $bar it's an external alias and would
capture an array of matches into the scope in which the rule was
declared.)

 And if the subpattern or subrule ends with an alternation, can the
 number of
 array elements to be appended (or hashed) vary depending on whitch
 branch is
 taken?

Again I have to refer this to @Larry, but my initial impression is
yes, it would vary.

 - Which of the following constructs could possibly be ok (I hope, none)?
 
 / $foo := ...  $foo := ... /

I think this one is okay.  $foo is an array of Match objects, and
each Match is likely repeated within the array.

 / $foo := ...   %foo := ... /

I hope this is not okay.  It's certainly not going to be okay anytime
soon in the PGE implementation of Perl 6 rules.  :-)

 / $foo := ... | %foo := ... /

Since the two aliases are in separate alternation branches, I think
this is okay.  The argument would be similar to

/ $foo := ... | @foo := .../

in which $foo is either a single Match object or an array of
Match objects depending on the branch matched.

 / $foo := $foo := ... /

While my instinctual reaction is to say that this ought to be okay,
upon thinking about it a bit more I think I'd prefer to say that
it's not.  At least initially, if nothing else.  In particular, I 
wonder about something like

/ @foo := $bar := [...]+ /

If we say that an alias always requires a subpattern or subrule
(and not another alias), then we avoid a lot of ambiguity, and the
above could be written as

/ @foo := [ $bar := [...]+ ] /
/ @foo := [ $bar := [...] ]+ /

depending on what is desired.

 - Do aliases bind right-to-left, as do assignments?
 / $2 := $5 := ... /   # next should be $3, not $6

Assuming we allow chained aliases such as this (see above note),
I'd still argue for $6 instead of $3.

 - Which kind of escape sequences are allowed (or required) in enumerated
 character classes?

AFAIK, this hasn't been completely decided or specified yet.  

Pm