On Wed, 28 Aug 2002, Deven T. Corzine wrote:
: I'm not saying we should dump the operators -- if we get more power by 
: assuming a backtracking implementation, maybe that's a worthwhile tradeoff.
: 
: On the other hand, if we can keep the implementation possibilities more 
: open, that's always a worthwhile goal, even if we're not sure if or when 
: we'll ever take advantage of those possibilities, or if we even could...

That is a worthy consideration, but expressiveness takes precedence
over it in this case.  DFAs are really only good for telling you
*whether* and *where* a pattern matches as a whole.  They are
relatively useless for telling you *how* a pattern matches.
For instance, a DFA can tell you that you have a valid computer
program, but can't hand you back the syntax tree, because it has
no way to decide between shifting and reducing.  It has to do both
simultaneously.

: It seems like backtracking is a Bad Thing, in that it leads to reprocessing 
: data that we've already looked at.  On the other hand, it seems to be a 
: Necessary Evil because of the memory costs of avoiding backtracking, and 
: because we might have to give up valuable features without backtracking.
: 
: It may be that backreferences already demand backtracking.  Or some other 
: feature might.  I don't know; I haven't thought it through.

I believe you are correct that backrefs require backtracking.  Maybe some smart
person will find a way to trace back through the states by which a DFA matched
to retrieve backref info, but that's probably worth a PhD or two.

Minimal matching is also difficult in a DFA, I suspect.

: If we must have backtracking, so be it.  But if that's a tradeoff we're 
: making for more expressive and powerful patterns, we should still at least 
: make that tradeoff with our eyes open.  And if the tradeoff can be avoided, 
: that's even better.

I refer you to http:://history.org/grandmother/teach/egg/suck.  :-)

That's a tradeoff I knowingly made in Perl 1.  I saw that awk had a
DFA implementation, and suffered for it as a useful tool.

And it's not just the backrefs.  An NFA can easily incorporate
strategies such as Boyer-Moore, which actually skips looking at many of
the characters, or the "scream" algorithm used by study, which can skip
even more.  All the DFAs I've seen have to look at every character,
albeit only once.  I suppose it's theoretically possible to compile
to a Boyer-Moore-ish state machine, but I've never seen it done.

Add to that the fact that most real-life patterns don't generally do
much backtracking, because they're written to succeed, not to fail.
This pattern never backtracks, for instance:

    my ($num) = /^Items: (\d+)/;

I'm not against applying a DFA implementation where it's useful
and practical, but just because it's the "best" in some limited
theoretical framework doesn't cut it for me.  Humans do a great
deal of backtracking in real life, once the limits of their parallel
pattern matching circuits are exceeded.  Even in language we often
have to reparse sentences that are garden pathological.  Why should
computers be exempt? :-)

Larry

Reply via email to