Re: generating grammars, capturing in regex interpolation, etc.

2015-04-20 Thread Nathan Gray
 Am 17.04.2015 um 04:34 schrieb Nathan Gray:
# Call it if it is a routine. This will capture if requested.
return (var)(self) if nqp::istype(var,Callable);
 
  This seems to indicate that captures in the embedded regexes
  should capture.

On Fri, Apr 17, 2015 at 09:47:22AM +0200, Tobias Leich wrote:
 The comment in  INTERPOLATE is about subcaptures... but if you do not
 capture the interpolated regex itself, you break that chain.

Is there a way to specify several captures?

For instance, if I build a data structure with information about
how to parse pieces from a string, and want to be able to
construct a regex from those pieces (which pieces I include, and
what order they are in will be specified as late as possible),
what is the best way to capture those pieces by name?

# Static data structure.
my %date_parts = (
year = {
regex = rx/\d**4/,
},
month = {
regex = rx/\d\d?/,
},
day = {
regex = rx=\d\d?/,
},
);

# Some made up routine that illustrates building a regex that captures.
my $regex = build_capturing_regex('year', rx/'-'/, 'month', rx/'-'/, 'day');

# The generated $regex looks like this (or matches the same as this):
rx/
$year=[\d**4]   # The name comes from the key in %date_parts, the value 
from the regex value.
'-'
$month=[\d\d?]
'-'
$day=[\d\d?]
/;

# Compare a date string to the regex.
my $date_string = '2015-04-20';
my $match = $date_string ~~ $regex;

# The $match contains:
~$matchyear  # '2015'
~$matchmonth # '04'
~$matchday   # '20'
~$match# '215-04-20'

Is there built-in functionality that does what build_capturing_regex()
illustrates?  For instance, if the %date_parts data structure is
re-written as a grammar, is there a way to dynamically specify
how regex TOP is defined?  Or is there a different way I could
approach this problem that is easier, or fits better with Perl 6?

If I need to write something like build_capturing_regex(), what
is the syntax to combine several pre-existing regexes into a
single regex, in a way that allows for capturing to occur?

If you've been following this thread, you know that I've tried
every syntax I could think of, plus any others that have been
suggested to me.  Matching always works.  Capturing a single
value works.  I have not been able to figure out how to capture
more than one value from a generated/interpolated/constructed
regex (unless I use strings instead of regexes and then EVAL the
string into a regex, but I think I should avoid that, unless
there's no better way).

-kolibrie



signature.asc
Description: Digital signature


Re: generating grammars, capturing in regex interpolation, etc.

2015-04-17 Thread Tobias Leich

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

The comment in  INTERPOLATE is about subcaptures... but if you do not
capture the interpolated regex itself, you break that chain.

Am 17.04.2015 um 04:34 schrieb Nathan Gray:
 On Wed, Apr 15, 2015 at 09:45:39PM -0400, Nathan Gray wrote:
 I had given up on using regexes embedded within regexes, because
 I could not get capturing to work.

 I did a backtrace on one of the test cases that fails, which led
 me to

   src/core/Cursor.pm

 in

   method INTERPOLATE(\var, $i = 0, $s = 0, $a = 0)

 with this comment:

   # Call it if it is a routine. This will capture if requested.
   return (var)(self) if nqp::istype(var,Callable);

 This seems to indicate that captures in the embedded regexes
 should capture.

 When the capture does not happen, is this a bug?

 The test cases I came up with, that illustrate capturing from
 embedded regexes (some which work, some which do not), are
 included below.

 -kolibrie

 use v6;
 use Test;

 my $year = '2015';
 my $month = '04';
 my $day = '11';
 my $date_string = $year-$month-$day;

 my $year_regex = rx/$year=[\d**4]/;
 my $month_regex = rx/$month=[\d\d?]/;
 my $day_regex = rx/$day=[\d\d?]/;
 my $separator = rx/'-'/;

 # Single named capture.
 {
 ok($date_string ~~ $year_regex, 'date string matches year regex');
 is(~$/, $year, 'matched is year string');
 is(~$/year, $year, 'year is captured');
 }

 # Single named capture in slashes.
 {
 ok($date_string ~~ /$year_regex/, 'date string matches year regex
when in slashes');
 is(~$/, $year, 'matched is year string when in slashes');
 is(~$/year, $year, 'year is captured when in slashes');  # Fails
 }

 # Single named capture embedded in named capture.
 {
 ok($date_string ~~ /$pattern=$year_regex/, 'date string matches
year regex when embedded');
 is(~$/, $year, 'matched is year string when embedded');
 is(~$/pattern, $year, 'pattern is captured when embedded');
 is(~$/patternyear, $year, 'year is captured when embedded');
 }

 # Single named capture embedded in named capture with brackets.
 {
 ok($date_string ~~ /$pattern=[$year_regex]/, 'date string
matches year regex when embedded with brackets');
 is(~$/, $year, 'matched is year string when embedded with brackets');
 is(~$/pattern, $year, 'pattern is captured when embedded with
brackets');
 is(~$/patternyear, $year, 'year is captured when embedded with
brackets');  # Fails
 }

 # Multiple named captures.
 {
 ok($date_string ~~ /$year_regex $separator $month_regex $separator
$day_regex/, 'date string matches multiple regexes in slashes');
 is(~$/, $date_string, 'matched is date string in multi regex with
slashes');
 is(~$/year, $year, 'year is captured in multi regex with
slashes');  # Fails
 is(~$/month, $month, 'month is captured in multi regex with
slashes');  # Fails
 is(~$/day, $day, 'day is captured in multi regex with
slashes');  # Fails
 }

 # Multiple named captures embedded in named capture.
 {
 ok($date_string ~~ /$pattern=$year_regex $separator $month_regex
$separator $day_regex/, 'date string matches multiple regexes when
embedded');
 is(~$/, $date_string, 'matched is date string in multi regex when
embedded');
 is(~$/pattern, $year, 'pattern is captured in multi regex when
embedded');
 is(~$/patternyear, $year, 'year is captured in multi regex
when embedded');
 is(~$/month, $month, 'month is captured in multi regex when
embedded');  # Fails
 is(~$/day, $day, 'day is captured in multi regex when
embedded');  # Fails
 }

 # Multiple named captures embedded in named capture with brackets.
 {
 ok($date_string ~~ /$pattern=[$year_regex $separator
$month_regex $separator $day_regex]/, 'date string matches multiple
regexes when embedded with brackets');
 is(~$/, $date_string, 'matched is date string in multi regex when
embedded with brackets');
 is(~$/pattern, $date_string, 'pattern is captured in multi regex
when embedded with brackets');
 is(~$/patternyear, $year, 'year is captured in multi regex
when embedded with brackets');  # Fails
 is(~$/patternmonth, $month, 'month is captured in multi regex
when embedded with brackets');  # Fails
 is(~$/patternday, $day, 'day is captured in multi regex when
embedded with brackets');  # Fails
 }




-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJVMLqGAAoJEKo+Od/eKQxFiN0IALO/O2XYMEwDabVIxQ41qBk2
npTfOPFCu/QTewHt0/rFh8ujZnx6APt5J+MDXHobB1v0E3ckMdZymsBgPSVFRYfh
XYgkWX0pelyH6Ys0M7oGEHuqZWPLavLXL5EC5gSNagKkwKdytKVlzMH+UPOEZma2
GX3GDpoCeJNuzwrACbrcwxRHYdGkaTgr19d0IyyPsGrtKKUj83yTKlb3GRPgDWUL
7g9uaIAdN87G8qvGGAdabV96gt8M0s8wRlLmKBL06q0uIs0YQvxbBLLhGrmHmX0r
4/Pc1lnS28tUdb6dfURXI3eSyFe7AmUiMnxUInojgsiJ3mQH/hmDrgOMIJTSCDE=
=K+kk
-END PGP SIGNATURE-



Re: generating grammars, capturing in regex interpolation, etc.

2015-04-16 Thread Nathan Gray
On Wed, Apr 15, 2015 at 09:45:39PM -0400, Nathan Gray wrote:
 I had given up on using regexes embedded within regexes, because
 I could not get capturing to work.

I did a backtrace on one of the test cases that fails, which led
me to

  src/core/Cursor.pm

in

  method INTERPOLATE(\var, $i = 0, $s = 0, $a = 0)

with this comment:

  # Call it if it is a routine. This will capture if requested.
  return (var)(self) if nqp::istype(var,Callable);

This seems to indicate that captures in the embedded regexes
should capture.

When the capture does not happen, is this a bug?

The test cases I came up with, that illustrate capturing from
embedded regexes (some which work, some which do not), are
included below.

-kolibrie

use v6;
use Test;

my $year = '2015';
my $month = '04';
my $day = '11';
my $date_string = $year-$month-$day;

my $year_regex = rx/$year=[\d**4]/;
my $month_regex = rx/$month=[\d\d?]/;
my $day_regex = rx/$day=[\d\d?]/;
my $separator = rx/'-'/;

# Single named capture.
{
ok($date_string ~~ $year_regex, 'date string matches year regex');
is(~$/, $year, 'matched is year string');
is(~$/year, $year, 'year is captured');
}

# Single named capture in slashes.
{
ok($date_string ~~ /$year_regex/, 'date string matches year regex when in 
slashes');
is(~$/, $year, 'matched is year string when in slashes');
is(~$/year, $year, 'year is captured when in slashes');  # Fails
}

# Single named capture embedded in named capture.
{
ok($date_string ~~ /$pattern=$year_regex/, 'date string matches year 
regex when embedded');
is(~$/, $year, 'matched is year string when embedded');
is(~$/pattern, $year, 'pattern is captured when embedded');
is(~$/patternyear, $year, 'year is captured when embedded');
}

# Single named capture embedded in named capture with brackets.
{
ok($date_string ~~ /$pattern=[$year_regex]/, 'date string matches year 
regex when embedded with brackets');
is(~$/, $year, 'matched is year string when embedded with brackets');
is(~$/pattern, $year, 'pattern is captured when embedded with brackets');
is(~$/patternyear, $year, 'year is captured when embedded with 
brackets');  # Fails
}

# Multiple named captures.
{
ok($date_string ~~ /$year_regex $separator $month_regex $separator 
$day_regex/, 'date string matches multiple regexes in slashes');
is(~$/, $date_string, 'matched is date string in multi regex with slashes');
is(~$/year, $year, 'year is captured in multi regex with slashes');  # 
Fails
is(~$/month, $month, 'month is captured in multi regex with slashes');  # 
Fails
is(~$/day, $day, 'day is captured in multi regex with slashes');  # Fails
}

# Multiple named captures embedded in named capture.
{
ok($date_string ~~ /$pattern=$year_regex $separator $month_regex 
$separator $day_regex/, 'date string matches multiple regexes when embedded');
is(~$/, $date_string, 'matched is date string in multi regex when 
embedded');
is(~$/pattern, $year, 'pattern is captured in multi regex when embedded');
is(~$/patternyear, $year, 'year is captured in multi regex when 
embedded');
is(~$/month, $month, 'month is captured in multi regex when embedded');  
# Fails
is(~$/day, $day, 'day is captured in multi regex when embedded');  # Fails
}

# Multiple named captures embedded in named capture with brackets.
{
ok($date_string ~~ /$pattern=[$year_regex $separator $month_regex 
$separator $day_regex]/, 'date string matches multiple regexes when embedded 
with brackets');
is(~$/, $date_string, 'matched is date string in multi regex when embedded 
with brackets');
is(~$/pattern, $date_string, 'pattern is captured in multi regex when 
embedded with brackets');
is(~$/patternyear, $year, 'year is captured in multi regex when 
embedded with brackets');  # Fails
is(~$/patternmonth, $month, 'month is captured in multi regex when 
embedded with brackets');  # Fails
is(~$/patternday, $day, 'day is captured in multi regex when embedded 
with brackets');  # Fails
}





signature.asc
Description: Digital signature


Re: generating grammars, capturing in regex interpolation, etc.

2015-04-14 Thread Patrick R. Michaud
On Tue, Apr 14, 2015 at 08:58:27PM -0400, Nathan Gray wrote:
 I've run into a snag, in that my strptime processing in Perl 5
 relies on building a string that looks like a regex with named
 captures, and then interpolating that into a real regex.
[...]
 my $pattern = Q/$greeting=[hello]/;
 my $string = Q/hello/;
[...]

Just an idea: instead of building strings to be interpolated into 
a regex, could you just build regexes directly?

my $pattern = rx/$greeting=[hello]/;
my $match = hello ~~ / pattern=$pattern /;

The resulting string is captured into $matchpatterngreeting;
The second statement can also be written as:

# captures into $matchpatterngreeting
my $match = hello ~~ / $pattern=$pattern /;
   
# captures into $match[0]greeting
my $match = hello ~~ / $0=$pattern /;   
 
Hope this is useful, or at least illustrative.

 Of course, there may be a better way, since regex interpolation
 seems frowned upon in Perl 6.

I think it's more that we treat regexes as first class components (actually
closures)...  rather than EVALing strings with metacharacters, we just 
build regex expressions and interpolate them directly.

Pm