Re: Regex stuff...

2002-08-31 Thread Damian Conway

Piers Cawley wrote:


> If I replace C<< ($key, $val) >> with 
> 
> @ary = m/<$pattern>/
> 
> and the match succeeds, how many elements are there in @ary? 

Zero. No explicit captures in that pattern.


> Suppose you want to use a hypothetical variable to bind a name to
> a capture:
> 
> / (\S+) { let $x := $1 } /
> 
> A shorthand for that is:
>   
> / $x:=(\S+) /
> 
> The parens are number independently of any name, so $x is an alias
> for $1.
> 
> And it's that last sentence that's important here. So, it looks like
> C<< +@ary >> is going to be 4. 

No. Only explicit paren captures add to the array attribute of the match object.
Implicit or explicit named captures add to the *hash* attribute of the match object.


> m: w / $2:=(\S+) = $1:=(\S+) /
> 
> Note that, left to their own devices, those grouping parentheses would
> generate the $1 and $2 in the order given.
> Now, assignment to hypotheticals doesn't happen all at once, it
> happens when the matching engine reaches that point in the
> string. Unless I'm very much mistaken, the order of execution will
> look like:
> 
>   $2:=$1; $1:=$2;
> 
> And it seems to me that we'll end up with $1 and $2 both bound to the
> same string; which isn't working quite how I expect. Or do we special
> case the occasions when we bind the result of a capturing group to a
> numeric match variable?

That's my understanding. If you *explicitly* bind a captured group to a
numbered hypothetical, then the capture doesn't also implicitly bind to
a numbered hypothetical.

Damian





Re: Regex stuff...

2002-08-31 Thread Ken Fox

Piers Cawley wrote:
>  Unless I'm very much mistaken, the order of execution will
> look like:
> 
>   $2:=$1; $1:=$2;

You're not binding $2:=$1. You're binding $2 to the first
capture. By default $1 is also bound to the first capture.

Assuming that numbered variables aren't special, the order
of execution is:

   $2:=$1:=first; $1:=$2:=second;

That doesn't make any sense though, so numbered variables
must be treated specially -- an explicit numbered binding
replaces the default numbered binding. So, the order of
execution is really:

   $2:=first; $1:=second;

I think this solves both of your puzzles.

One last thing though. Binding might be done at compile-time
because it changes variables, not the values of variables.
Thinking about binding as a compile-time declaration might
be easier than thinking about run-time execution order.

Thinking about binding as a compile-time thing, the rule

   / $2:=(\S+) = $1:=(\S+) /

becomes

   / [\S+] = [\S+] /

- Ken




Re: Regex stuff...

2002-08-31 Thread Markus Laire

On 31 Aug 2002 at 10:26, Piers Cawley wrote:


> > my $pattern = rx:w / $1:=(\S+) = $2:=(\S+) |
> >  $2:=(\S+) = $1:=(\S+) /;
> 
> Count the capturing groups. Looks like there's 4 of 'em to me. $1, $2,
> $3 and $4 are automatic variables which, according to the Apocalypse
> get set for every capturing group independent of any named variable to
> which they might also be bound.

Not if those capturing groups have been renumbered.
>From A5:

> You can reorder paren groups by naming them with numeric variables:
> 
> / $2:=(.*?), \h* $1:=(.*) /
> If you use a numeric variable, the
> numeric variables will start renumbering from that point, so
> subsequent captures can be of a known number (which clobbers any
> previous association with that number). So for instance you can
> reset the numbers for each alternative:
> 
> / $1 := (.*?) (\:)  (.*) { process $1, $2, $3 }
> | $1 := (.*?) (=\>) (.*) { process $1, $2, $3 }
> | $1 := (.*?) (-\>) (.*) { process $1, $2, $3 }
> /

So binding to $1 etc is a special case. Your example never captures 
to $1..$4 but only to $1,$2 according to the renumbering.

Note that it's actually called 'reordering/renumbering' instead of 
'binding' in A5 for numeric variables.

-- 
Markus Laire 'malaire' <[EMAIL PROTECTED]>





Re: Regex stuff...

2002-08-31 Thread Markus Laire

On 31 Aug 2002 at 0:17, Piers Cawley wrote:

> my $pattern = rx:w / $1:=(\S+) = $2:=(\S+) |
>  $2:=(\S+) = $1:=(\S+) /;
> 
> @ary = m/<$pattern>/
> 
> how many elements are there in @ary? I can
> make a case for 4 quite happily. Certainly that's what A5 seems to
> imply:
> 
> Suppose you want to use a hypothetical variable to bind a name to
> a capture:
> 
> / (\S+) { let $x := $1 } /
> 
> A shorthand for that is:
>   
> / $x:=(\S+) /
> 
> The parens are number independently of any name, so $x is an alias
> for $1.
> 
> And it's that last sentence that's important here. So, it looks like
> C<< +@ary >> is going to be 4. 

How could it be 4? If the example would've been

> my $pattern = rx:w / $a:=(\S+) = $b:=(\S+) |
>  $b:=(\S+) = $a:=(\S+) /;

Then there is 4 variables to speak of ($1,$2,$a,$b) and a question
arises about which of these are returned.

In the original example however we only have 2 variables ($1,$2) so
it can't really return anything else than those 2.


> m: w / $2:=(\S+) = $1:=(\S+) /
> 
> Now, assignment to hypotheticals doesn't happen all at once, it
> happens when the matching engine reaches that point in the
> string. Unless I'm very much mistaken, the order of execution will
> look like:
> 
>   $2:=$1; $1:=$2;
> 
> And it seems to me that we'll end up with $1 and $2 both bound to the
> same string; which isn't working quite how I expect. Or do we special
> case the occasions when we bind the result of a capturing group to a
> numeric match variable?

As I understand it, binding to $1 etc.. is a special case. Also I
don't see any problems in your example:

m: w / $2:=(\S+) = $1:=(\S+) /

First () is captured and assigned to $2 (instead of $1).
Then second () is captured and assigned to $1.

-- 
Markus Laire 'malaire' <[EMAIL PROTECTED]>