Re: generating grammars, capturing in regex interpolation, etc.

2015-04-20 Thread Nathan Gray
> Am 17.04.2015 um 04:34 schrieb Nathan Gray:
> >   # Call it if it is a routine. This will capture if requested.
> >   return (var)(self) if nqp::istype(var,Callable);
> >
> > This seems to indicate that captures in the embedded regexes
> > should capture.

On Fri, Apr 17, 2015 at 09:47:22AM +0200, Tobias Leich wrote:
> The comment in  INTERPOLATE is about "subcaptures"... but if you do not
> capture the interpolated regex itself, you break that chain.

Is there a way to specify several captures?

For instance, if I build a data structure with information about
how to parse pieces from a string, and want to be able to
construct a regex from those pieces (which pieces I include, and
what order they are in will be specified as late as possible),
what is the best way to capture those pieces by name?

# Static data structure.
my %date_parts = (
year => {
regex => rx/\d**4/,
},
month => {
regex => rx/\d\d?/,
},
day => {
regex => rx=\d\d?/,
},
);

# Some made up routine that illustrates building a regex that captures.
my $regex = build_capturing_regex('year', rx/'-'/, 'month', rx/'-'/, 'day');

# The generated $regex looks like this (or matches the same as this):
rx/
$=[\d**4]   # The name comes from the key in %date_parts, the value 
from the regex value.
'-'
$=[\d\d?]
'-'
$=[\d\d?]
/;

# Compare a date string to the regex.
my $date_string = '2015-04-20';
my $match = $date_string ~~ $regex;

# The $match contains:
~$match  # '2015'
~$match # '04'
~$match   # '20'
~$match# '215-04-20'

Is there built-in functionality that does what build_capturing_regex()
illustrates?  For instance, if the %date_parts data structure is
re-written as a grammar, is there a way to dynamically specify
how regex TOP is defined?  Or is there a different way I could
approach this problem that is easier, or fits better with Perl 6?

If I need to write something like build_capturing_regex(), what
is the syntax to combine several pre-existing regexes into a
single regex, in a way that allows for capturing to occur?

If you've been following this thread, you know that I've tried
every syntax I could think of, plus any others that have been
suggested to me.  Matching always works.  Capturing a single
value works.  I have not been able to figure out how to capture
more than one value from a generated/interpolated/constructed
regex (unless I use strings instead of regexes and then EVAL the
string into a regex, but I think I should avoid that, unless
there's no better way).

-kolibrie



signature.asc
Description: Digital signature


Re: Grammars

2015-04-20 Thread Larry Wall
On Sun, Apr 19, 2015 at 06:31:30PM +0200, mt1957 wrote:
: L.s.,
: 
: I found a small problem when writing a piece of grammar. A
: simplified part of it is shown here;
: ...
: token tag-body   {  ~   }
: token body-start { '[' }
: token body-end  { ']' }
: token body-text  { .*?  }
: ...
: 

A couple of things:

The ~ is intended primarily for literal delimiters, so you'd typically just
see something like:

token tag-body   { '[' ~ ']'  }
token body-text  { .*?  }

In this case there would be no body-end rule at all--which means you'd
hang the action routine somewhere else.  So you could just as easily
hang your action routine on tag-body or on body-text, depending on
whether you care about whether the match object includes the delimiters.
In either case, it doesn't have to attach to the final delimiter.

:  * Is there a possibility to give the method more information in the
:form of boolean flags saying for example that there was a look ahead
:match, all in all the parser knows about the way it must seek!

One could always set a dynamic variable inside the "not really" rule:

token body-text {
:my $*NOT-REALLY = 1;
.*?

}

but it's easier to just move the reduction action.

Larry