Re: Regex - Accessing captured subrules could be problematic

2008-12-05 Thread Patrick R. Michaud
On Thu, Dec 04, 2008 at 07:00:55PM +0100, Moritz Lenz wrote:
> GW wrote:
> > I found something that could be problematic (haven't yet found out if it
> > should be a special case) in Synopsis 5. More precisely it is under the
> > chapter "Accessing captured subrules" in the test case
> > t/regex/from_perl6_rules/capture.t lines 67–71:
> > 
> > ok(eval(' "bookkeeper" ~~ m/ ($/)/ '), 'Named backref',
> > :todo);
> > 
> > How can the parser know what you mean by $/? Maybe you want $/
> > followed by  or maybe $/?

I suspect $/ would parse as a single variable.  If you want
the separate subrule one can use whitespace (as noted in other posts)
or brackets (if whitespace is an issue):

[$/]

> The same rule applies for interpolation in strings:
> 
> "my big $house.uc" is parsed as "my big { $house.uc }", ie $house.uc is
> taken as one token, even though a valid interpretation would be to
> interpolate $house first and then append .uc to that string.

Alas, this is not exactly the case here.  Interpolation in strings 
only occurs for things that end with a postcircumfix (parens, braces,
brackets), thus

my $house = 'building';

say "a big $house.uc";#   "a big building.uc"
say "a big { $house.uc }";#   "a big BUILDING"
say "a big $house.uc()";  #   "a big BUILDING"
say "a big { $house }.uc";#   "a big building.uc"

> The Perl 6 solution is that you disambiguate with whitespace if you
> don't want to follow the LTM-rule (ie you'd say '$/ ' in the regex).

I think I would tend to recommend disambiguating with brackets
instead of whitespace -- it's slightly more explicit (similar to
how we recommend disambiguating with parens in other types of expressions).

Pm


Re: Regex - Accessing captured subrules could be problematic

2008-12-04 Thread Jonathan Scott Duff
On Wed, Dec 3, 2008 at 6:19 PM, GW <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I found something that could be problematic (haven't yet found out if it
> should be a special case) in Synopsis 5. More precisely it is under the
> chapter "Accessing captured subrules" in the test case
> t/regex/from_perl6_rules/capture.t lines 67–71:
>
> ok(eval(' "bookkeeper" ~~ m/ ($/)/ '), 'Named backref',
> :todo);
>
> How can the parser know what you mean by $/? Maybe you want $/
> followed by  or maybe $/?


If you wanted $/ followed by   then you would introduce some
intervening whitespace.
This is just like interpolation into double quoted strings. When you say

my @what = ;
my $str = "This is a @what[2]";

you always get the 3rd item from the  @what array interpolated, not the
string '@what[2]'.  If you want the latter, you have to use some other
technique (concatenation, single quotes, etc.)


> A rewrite of this to $ would solve this specific problem, but
> not situations like: $/. Variants like $/. are also
> ambiguous.


Same for these.

And if you're doing this in a context where whitespace has meaning (e.g.
:sigspace is in effect), but you don't want the significant whitespace, you
can turn that off temporarily (or again, use some other technique).

HTH,

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]


Re: Regex - Accessing captured subrules could be problematic

2008-12-04 Thread Moritz Lenz
Hello,

GW wrote:
> I found something that could be problematic (haven't yet found out if it
> should be a special case) in Synopsis 5. More precisely it is under the
> chapter "Accessing captured subrules" in the test case
> t/regex/from_perl6_rules/capture.t lines 67–71:
> 
> ok(eval(' "bookkeeper" ~~ m/ ($/)/ '), 'Named backref',
> :todo);
> 
> How can the parser know what you mean by $/? Maybe you want $/
> followed by  or maybe $/?

I don't know if this is the answer to your particular question, but
these questions are usually answered by "Longest Token Matching" (LTM).
This principle says that every grammar rule that parses the source code
eats up as many characters as possible.

So I think this means here that $/ will be parsed as one long
token instead of two separate tokens.

The same rule applies for interpolation in strings:

"my big $house.uc" is parsed as "my big { $house.uc }", ie $house.uc is
taken as one token, even though a valid interpretation would be to
interpolate $house first and then append .uc to that string.

> A rewrite of this to $ would solve this specific problem, but
> not situations like: $/. Variants like $/. are also
> ambiguous.
> 
> A solution could be something like like the Perl5 style: ${/}

The Perl 6 solution is that you disambiguate with whitespace if you
don't want to follow the LTM-rule (ie you'd say '$/ ' in the regex).

For string interpolation embedded closures ("my big {$house}.uc") can be
used for disambiguation.

Cheers,
Moritz