Regular expressions and closures (was: Perl 6 Summary for 2004-10-01 through 2004-10-17)

Aaron Sherman Tue, 26 Oct 2004 10:42:20 -0700

On Tue, 2004-10-26 at 05:17, Matthew Walton wrote:

> Also, climbing back out and shouting 'Eureka' would only really be 
> appropriate if you actually had experienced a moment of revelation about 
> something. I suspect you were too busy with the not drowning part for that.


Well, such moments of total-awareness triggered by danger are often the
origin of a revelation, but you never know how it's going to play out.
For my part, I can't recall what I was doing when this revelation
hit....

Larry, while you're feeling chatty, I have a question about Perl 6
regular expressions for you. You answered a question of mine, long ago
with a correction. I had said something like:

        /ab(c|b){$1 eq 'c'}/

If I recall correctly you had said something like, "there is no plan
(yet) to allow embedded closures to affect matching directly, other than
failing outright." This implied (to me) that you were thinking along
those lines, but were not sure if the regex engine would support it.

Given as how I'm working on a prototype regex engine for Parrot, into
which I intend to plug a Perl 6 rx parser I wanted to ask what your
thoughts on the language are there.

My work so far, and discussions with friends suggest that my scheme for
building an FSA with attached continuations will work as long as I place
some heavy restrictions on the behavior and side-effects of the closure.
I have some code now, and should have a functional prototype in a couple
of weeks.

The way I see this working is that you build an FSA like this:

 State 1:
                match(token=>'a', target => 2),
                match(token=>!'a', target => trap)
 State 2:
                match(token=>'b', target => 3),
                match(token=>!'b', target => trap)
        NOTE: Up this point might be optimized by a back-end to a match
        against the multi-character string, "ab", but that's not the
        FSA's concern.
 State 3 (start subexp mark, end subexp mark):
                match(token => 'c', target => 4),
                match(token => 'b', target => 4),
                match(token => !['c','b'], target => trap)
 State 4 (closure {$1 eq 'c'}):
                match(token => true, target => 5)
                match(token => false, target => trap)
 State 5 (terminal)

This representation is, of course, after minimization. Comparison of
states normally only takes into account if they are terminal or
non-terminal. In this case, there are 3 additional attributes of a
state:

        start subexp mark: boolean: start saving with this token
        end subexp mark: boolean: stop saving after this token
        closure: continuation: sub(RXContext $cx) returns(RXToken)

For a state to be equivalent to another state, it must satisfy:

        state1.terminal == state2.terminal
        state1.start_mark == state2.start_mark
        state1.end_mark == state2.end_mark
        state1.closure == state2.closure

and, of course, the normal recursive comparison of all transitions and
the states they target.

This allows for transformation and minimization of the FSA as well as
arbitrary code execution. The obvious drawback is something that you've
pointed out several times: the code may be executed many times, not at
all or may be executed and then backtracked over (and then re-executed
or not). This makes side-effects dangerous and unpredictable, but I
think that's OK.

I do think that in Perl, we'll want to add a backtrack API though so
that you can do this:

        rule newuser {
          user:(\w+) ssid:(\w+)
          {
                db_exec("begin transaction");
                let $uid = db_exec("create_user($1,$2)");
                BACKTRACK {
                        db_exec("rollback transaction")
                }
          }
          department:(\S+)
          {
                my $dpid = get_department($1) and
                        db_exec("assign_dept($uid, $3)") and
                        db_exec("commit transaction");
          }
        }

These examples only use a true/false return from the closure, which may
be all Perl allows for, but the FSA itself will allow a closure to
"match" any token (as if it had been in the input stream) by returning
it (the token is consumed, but is translated to the empty string for
purposes of sub-expressions and other "memory"). Its important to keep
this separate in concept from actually causing the FSA to transition to
a target state (which the closure cannot do). I can't think of a syntax
off the top of my head for taking advantage of that, but its certainly
something to bear in mind.

Larry et al., please let me know if this is in line with The Plan, as
I'd like the first pass to be sufficient to implement Perl 6 rxes, so
that even if the whole thing has to be re-written, at least we have a
framework and API.

-- 
â 781-324-3772
â [EMAIL PROTECTED] [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Regular expressions and closures (was: Perl 6 Summary for 2004-10-01 through 2004-10-17)

Reply via email to