Re: Parsing data

Aaron Sherman Wed, 07 Oct 2009 16:18:39 -0700

Sorry, I accidentally took the thread off-list. Re-posting some of my
comments below:

On Wed, Oct 7, 2009 at 6:50 PM, Moritz Lenz <mor...@faui2k3.org> wrote:
> Aaron Sherman wrote:
>> One of the first things that's becoming obvious to me in playing with
>> Rakudo's rules is that parsing strings isn't always what I'm going to
>> want to do. The most common example of wanting to parse data that's
>> not in string form is the YACC scenario where you want to have a
>> function produce a stream of tokenized data that is then parsed into a
>> more complex representation. In similar fashion there's transformers
>> like TGE that take syntax trees and transform them into alternative
>> representations.
>>
>> To that end, I'd like to suggest (for 6.1 or whatever comes after
>> initial stability) an extension to rules:
>
> Did you read
> http://perlcabal.org/syn/S05.html#Matching_against_non-strings already?

(I went off and read that, and then replied to Moritz):

OK, no. That proposal only does part of the work. It would suffice for
something like the lexer/parser connectivity, but transforming complex
data structures would be impossible. By contrast, what I've suggested
would work for both cases. It also preserves the existing <a b c> ~~
/b/ functionality that we have today, and it's not entirely clear to
me that the proposal that you linked to does.

So, to re-cap:

:data turns on data mode which treats the input as an iterator and
matches each atom in the regex against objects returned by the
iterator (must be rewindable? or do we cache when it's not?)

Then, inside the regex we use <^...> to match complex data such as:

<^~ ... > - match digits in a single element (equiv to <,> \d+ <,> in
the proposal you linked), with :data turned off
<^{ ... }> - smart match the return value of the closure against current element
<^::identifier> - Smart match a type against the current element
<^[...]> - Descend the current element, which should be iterable
itself and match with :data turned on
<^ name> - Same as <^[<name>]>

This should be powerful enough to match any arbitrarily nested set of
iterable objects. I think it will be particularly useful against parse
trees (and similar structures such as XML/HTML DOMs) and scanner
productions, though users will probably find nearly infinite uses for
it, much like original regular expressions.

Re: Parsing data

Reply via email to