Re: generating grammars, capturing in regex interpolation, etc.
Am 17.04.2015 um 04:34 schrieb Nathan Gray: # Call it if it is a routine. This will capture if requested. return (var)(self) if nqp::istype(var,Callable); This seems to indicate that captures in the embedded regexes should capture. On Fri, Apr 17, 2015 at 09:47:22AM +0200, Tobias Leich wrote: The comment in INTERPOLATE is about subcaptures... but if you do not capture the interpolated regex itself, you break that chain. Is there a way to specify several captures? For instance, if I build a data structure with information about how to parse pieces from a string, and want to be able to construct a regex from those pieces (which pieces I include, and what order they are in will be specified as late as possible), what is the best way to capture those pieces by name? # Static data structure. my %date_parts = ( year = { regex = rx/\d**4/, }, month = { regex = rx/\d\d?/, }, day = { regex = rx=\d\d?/, }, ); # Some made up routine that illustrates building a regex that captures. my $regex = build_capturing_regex('year', rx/'-'/, 'month', rx/'-'/, 'day'); # The generated $regex looks like this (or matches the same as this): rx/ $year=[\d**4] # The name comes from the key in %date_parts, the value from the regex value. '-' $month=[\d\d?] '-' $day=[\d\d?] /; # Compare a date string to the regex. my $date_string = '2015-04-20'; my $match = $date_string ~~ $regex; # The $match contains: ~$matchyear # '2015' ~$matchmonth # '04' ~$matchday # '20' ~$match# '215-04-20' Is there built-in functionality that does what build_capturing_regex() illustrates? For instance, if the %date_parts data structure is re-written as a grammar, is there a way to dynamically specify how regex TOP is defined? Or is there a different way I could approach this problem that is easier, or fits better with Perl 6? If I need to write something like build_capturing_regex(), what is the syntax to combine several pre-existing regexes into a single regex, in a way that allows for capturing to occur? If you've been following this thread, you know that I've tried every syntax I could think of, plus any others that have been suggested to me. Matching always works. Capturing a single value works. I have not been able to figure out how to capture more than one value from a generated/interpolated/constructed regex (unless I use strings instead of regexes and then EVAL the string into a regex, but I think I should avoid that, unless there's no better way). -kolibrie signature.asc Description: Digital signature
Re: generating grammars, capturing in regex interpolation, etc.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 The comment in INTERPOLATE is about subcaptures... but if you do not capture the interpolated regex itself, you break that chain. Am 17.04.2015 um 04:34 schrieb Nathan Gray: On Wed, Apr 15, 2015 at 09:45:39PM -0400, Nathan Gray wrote: I had given up on using regexes embedded within regexes, because I could not get capturing to work. I did a backtrace on one of the test cases that fails, which led me to src/core/Cursor.pm in method INTERPOLATE(\var, $i = 0, $s = 0, $a = 0) with this comment: # Call it if it is a routine. This will capture if requested. return (var)(self) if nqp::istype(var,Callable); This seems to indicate that captures in the embedded regexes should capture. When the capture does not happen, is this a bug? The test cases I came up with, that illustrate capturing from embedded regexes (some which work, some which do not), are included below. -kolibrie use v6; use Test; my $year = '2015'; my $month = '04'; my $day = '11'; my $date_string = $year-$month-$day; my $year_regex = rx/$year=[\d**4]/; my $month_regex = rx/$month=[\d\d?]/; my $day_regex = rx/$day=[\d\d?]/; my $separator = rx/'-'/; # Single named capture. { ok($date_string ~~ $year_regex, 'date string matches year regex'); is(~$/, $year, 'matched is year string'); is(~$/year, $year, 'year is captured'); } # Single named capture in slashes. { ok($date_string ~~ /$year_regex/, 'date string matches year regex when in slashes'); is(~$/, $year, 'matched is year string when in slashes'); is(~$/year, $year, 'year is captured when in slashes'); # Fails } # Single named capture embedded in named capture. { ok($date_string ~~ /$pattern=$year_regex/, 'date string matches year regex when embedded'); is(~$/, $year, 'matched is year string when embedded'); is(~$/pattern, $year, 'pattern is captured when embedded'); is(~$/patternyear, $year, 'year is captured when embedded'); } # Single named capture embedded in named capture with brackets. { ok($date_string ~~ /$pattern=[$year_regex]/, 'date string matches year regex when embedded with brackets'); is(~$/, $year, 'matched is year string when embedded with brackets'); is(~$/pattern, $year, 'pattern is captured when embedded with brackets'); is(~$/patternyear, $year, 'year is captured when embedded with brackets'); # Fails } # Multiple named captures. { ok($date_string ~~ /$year_regex $separator $month_regex $separator $day_regex/, 'date string matches multiple regexes in slashes'); is(~$/, $date_string, 'matched is date string in multi regex with slashes'); is(~$/year, $year, 'year is captured in multi regex with slashes'); # Fails is(~$/month, $month, 'month is captured in multi regex with slashes'); # Fails is(~$/day, $day, 'day is captured in multi regex with slashes'); # Fails } # Multiple named captures embedded in named capture. { ok($date_string ~~ /$pattern=$year_regex $separator $month_regex $separator $day_regex/, 'date string matches multiple regexes when embedded'); is(~$/, $date_string, 'matched is date string in multi regex when embedded'); is(~$/pattern, $year, 'pattern is captured in multi regex when embedded'); is(~$/patternyear, $year, 'year is captured in multi regex when embedded'); is(~$/month, $month, 'month is captured in multi regex when embedded'); # Fails is(~$/day, $day, 'day is captured in multi regex when embedded'); # Fails } # Multiple named captures embedded in named capture with brackets. { ok($date_string ~~ /$pattern=[$year_regex $separator $month_regex $separator $day_regex]/, 'date string matches multiple regexes when embedded with brackets'); is(~$/, $date_string, 'matched is date string in multi regex when embedded with brackets'); is(~$/pattern, $date_string, 'pattern is captured in multi regex when embedded with brackets'); is(~$/patternyear, $year, 'year is captured in multi regex when embedded with brackets'); # Fails is(~$/patternmonth, $month, 'month is captured in multi regex when embedded with brackets'); # Fails is(~$/patternday, $day, 'day is captured in multi regex when embedded with brackets'); # Fails } -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJVMLqGAAoJEKo+Od/eKQxFiN0IALO/O2XYMEwDabVIxQ41qBk2 npTfOPFCu/QTewHt0/rFh8ujZnx6APt5J+MDXHobB1v0E3ckMdZymsBgPSVFRYfh XYgkWX0pelyH6Ys0M7oGEHuqZWPLavLXL5EC5gSNagKkwKdytKVlzMH+UPOEZma2 GX3GDpoCeJNuzwrACbrcwxRHYdGkaTgr19d0IyyPsGrtKKUj83yTKlb3GRPgDWUL 7g9uaIAdN87G8qvGGAdabV96gt8M0s8wRlLmKBL06q0uIs0YQvxbBLLhGrmHmX0r 4/Pc1lnS28tUdb6dfURXI3eSyFe7AmUiMnxUInojgsiJ3mQH/hmDrgOMIJTSCDE= =K+kk -END PGP SIGNATURE-
Re: generating grammars, capturing in regex interpolation, etc.
On Wed, Apr 15, 2015 at 09:45:39PM -0400, Nathan Gray wrote: I had given up on using regexes embedded within regexes, because I could not get capturing to work. I did a backtrace on one of the test cases that fails, which led me to src/core/Cursor.pm in method INTERPOLATE(\var, $i = 0, $s = 0, $a = 0) with this comment: # Call it if it is a routine. This will capture if requested. return (var)(self) if nqp::istype(var,Callable); This seems to indicate that captures in the embedded regexes should capture. When the capture does not happen, is this a bug? The test cases I came up with, that illustrate capturing from embedded regexes (some which work, some which do not), are included below. -kolibrie use v6; use Test; my $year = '2015'; my $month = '04'; my $day = '11'; my $date_string = $year-$month-$day; my $year_regex = rx/$year=[\d**4]/; my $month_regex = rx/$month=[\d\d?]/; my $day_regex = rx/$day=[\d\d?]/; my $separator = rx/'-'/; # Single named capture. { ok($date_string ~~ $year_regex, 'date string matches year regex'); is(~$/, $year, 'matched is year string'); is(~$/year, $year, 'year is captured'); } # Single named capture in slashes. { ok($date_string ~~ /$year_regex/, 'date string matches year regex when in slashes'); is(~$/, $year, 'matched is year string when in slashes'); is(~$/year, $year, 'year is captured when in slashes'); # Fails } # Single named capture embedded in named capture. { ok($date_string ~~ /$pattern=$year_regex/, 'date string matches year regex when embedded'); is(~$/, $year, 'matched is year string when embedded'); is(~$/pattern, $year, 'pattern is captured when embedded'); is(~$/patternyear, $year, 'year is captured when embedded'); } # Single named capture embedded in named capture with brackets. { ok($date_string ~~ /$pattern=[$year_regex]/, 'date string matches year regex when embedded with brackets'); is(~$/, $year, 'matched is year string when embedded with brackets'); is(~$/pattern, $year, 'pattern is captured when embedded with brackets'); is(~$/patternyear, $year, 'year is captured when embedded with brackets'); # Fails } # Multiple named captures. { ok($date_string ~~ /$year_regex $separator $month_regex $separator $day_regex/, 'date string matches multiple regexes in slashes'); is(~$/, $date_string, 'matched is date string in multi regex with slashes'); is(~$/year, $year, 'year is captured in multi regex with slashes'); # Fails is(~$/month, $month, 'month is captured in multi regex with slashes'); # Fails is(~$/day, $day, 'day is captured in multi regex with slashes'); # Fails } # Multiple named captures embedded in named capture. { ok($date_string ~~ /$pattern=$year_regex $separator $month_regex $separator $day_regex/, 'date string matches multiple regexes when embedded'); is(~$/, $date_string, 'matched is date string in multi regex when embedded'); is(~$/pattern, $year, 'pattern is captured in multi regex when embedded'); is(~$/patternyear, $year, 'year is captured in multi regex when embedded'); is(~$/month, $month, 'month is captured in multi regex when embedded'); # Fails is(~$/day, $day, 'day is captured in multi regex when embedded'); # Fails } # Multiple named captures embedded in named capture with brackets. { ok($date_string ~~ /$pattern=[$year_regex $separator $month_regex $separator $day_regex]/, 'date string matches multiple regexes when embedded with brackets'); is(~$/, $date_string, 'matched is date string in multi regex when embedded with brackets'); is(~$/pattern, $date_string, 'pattern is captured in multi regex when embedded with brackets'); is(~$/patternyear, $year, 'year is captured in multi regex when embedded with brackets'); # Fails is(~$/patternmonth, $month, 'month is captured in multi regex when embedded with brackets'); # Fails is(~$/patternday, $day, 'day is captured in multi regex when embedded with brackets'); # Fails } signature.asc Description: Digital signature
Re: generating grammars, capturing in regex interpolation, etc.
On Tue, Apr 14, 2015 at 08:58:27PM -0400, Nathan Gray wrote: I've run into a snag, in that my strptime processing in Perl 5 relies on building a string that looks like a regex with named captures, and then interpolating that into a real regex. [...] my $pattern = Q/$greeting=[hello]/; my $string = Q/hello/; [...] Just an idea: instead of building strings to be interpolated into a regex, could you just build regexes directly? my $pattern = rx/$greeting=[hello]/; my $match = hello ~~ / pattern=$pattern /; The resulting string is captured into $matchpatterngreeting; The second statement can also be written as: # captures into $matchpatterngreeting my $match = hello ~~ / $pattern=$pattern /; # captures into $match[0]greeting my $match = hello ~~ / $0=$pattern /; Hope this is useful, or at least illustrative. Of course, there may be a better way, since regex interpolation seems frowned upon in Perl 6. I think it's more that we treat regexes as first class components (actually closures)... rather than EVALing strings with metacharacters, we just build regex expressions and interpolate them directly. Pm