Re: Regular expressions and closures (was: Perl 6 Summary for 2004-10-01 through 2004-10-17)

Aaron Sherman Wed, 27 Oct 2004 14:56:28 -0700

On Wed, 2004-10-27 at 14:01, Luke Palmer wrote:
> Aaron Sherman writes:

> > >   /ab(c|b) {fail unless $1 eq 'c'}/
> > 
> > Now, what does "fail" mean? I can think of two definitions:
> > 
> >     1. proceed to trap state (backtracking then happens)
> >     2. exit (probably using an exception) the rule?
> 
> The first one.


Great. In that case, that's what I expected. Number 2 might require my
having more control over exceptions than I really want to.
 
> > NDFAs (or NFAs as you say) 
> 
> Last time I checked, "nondeterministic" was a word all by itself. :-p

Tomato, tomatoe.... Yeah, I know, but I grew up hearing NDFA at school,
so I'm stuck. :-)

> > should be the same as DFAs for this purpose.  I think it's just a
> > difference in the way we look at the token-matching process. I see it
> > as part of the state, so to say "we both begin and end subexpression
> > storage here" is to say, "only the token matched here will be stored
> > as the subexpression." This restricts you, though, and I may find that
> > in order to say:
> > 
> >     (a)((b)c)
> > 
> > I need too much hair and want to put the markers elsewhere.... I'll know
> > that by next week.
> 
> I've given this a lot of thought in the past.  Way too much, actually.
> I think you'll find your DFA approach to fail when you deal with the
> generic case of perl 6 rules.

Hmmm, actually I thought Perl 6 rules would be trivial because almost
all of the non-regular-expression parts are so perfectly impossible to
model in an FSA. There's no ambiguity as to how that's done that I can
see, it simply has to be a call to external code.

Its in places where I think I have to try to cram difficult concepts
into the FSA that it gets hard.

> DFAs can't adequately model context free grammars.  I think it's best
> for your brain if you leave it at that.

Certainly. Let's just stop here if you thought I was saying anything
differently. I *am* talking about using the API for building an FSA to
also describe where the non-finite bits live, but that's just an API,
and I take no responsibility for deciding how you generate what code to
handle the rest, I'm just telling you what bag to drop it in once you
do. This, of course, branches off into compiler topics that I didn't
want to get into yet. Let me finish the FSA API and the trivial regex
parser plugin before we start talking about how a compiler will use this
too deeply (and we can have that chat on p6c... I really do have to
remember to subscribe to p6c).

> Perl 6 also allows conditionals within rules.  So if you have an
> alternation:
> 
>     / [ <( cond_1() )> | <( cond_2() )> | <( cond_3() )> ] /
> 
> You're aware that you need to make those conditions orthogonal, right?
> You can only proceed to one of those states if you're sure that it's the
> _only_ one that matches.  It turns out that this is going to take as
> much time _all the time_ as the NFA takes in its worst case.

Well, most of that's a matter for the back-end to deal with, and the
relatively optimization-free back end I'm planning for a start will
output code which does exactly what you describe. Obviously if Perl
places restrictions on how optimizations can be applied to such cases,
then it will need its own back-end or a generic back-end which has some
sort of tagging that turns off optimizations, all of which is fine. None
of that touches how we store and represent the FSA before it's compiled
to Parrot.

> DFAs are a bad idea for Perl 6. 

You seem to have taken a left turn somewhere. Let's get our assumptions
above cleared up before you start leaping to the conclusion that what
I'm in the middle of won't work.

> On the other hand, optimizing sufficiently simple parts of regular
> expressions to DFAs and embedding them in a more general system is a
> pretty good idea, IMO.

I think that's what I said.

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Regular expressions and closures (was: Perl 6 Summary for 2004-10-01 through 2004-10-17)

Reply via email to