On Wed, 28 Aug 2002, Deven T. Corzine wrote: : I'm not saying we should dump the operators -- if we get more power by : assuming a backtracking implementation, maybe that's a worthwhile tradeoff. : : On the other hand, if we can keep the implementation possibilities more : open, that's always a worthwhile goal, even if we're not sure if or when : we'll ever take advantage of those possibilities, or if we even could...
That is a worthy consideration, but expressiveness takes precedence over it in this case. DFAs are really only good for telling you *whether* and *where* a pattern matches as a whole. They are relatively useless for telling you *how* a pattern matches. For instance, a DFA can tell you that you have a valid computer program, but can't hand you back the syntax tree, because it has no way to decide between shifting and reducing. It has to do both simultaneously. : It seems like backtracking is a Bad Thing, in that it leads to reprocessing : data that we've already looked at. On the other hand, it seems to be a : Necessary Evil because of the memory costs of avoiding backtracking, and : because we might have to give up valuable features without backtracking. : : It may be that backreferences already demand backtracking. Or some other : feature might. I don't know; I haven't thought it through. I believe you are correct that backrefs require backtracking. Maybe some smart person will find a way to trace back through the states by which a DFA matched to retrieve backref info, but that's probably worth a PhD or two. Minimal matching is also difficult in a DFA, I suspect. : If we must have backtracking, so be it. But if that's a tradeoff we're : making for more expressive and powerful patterns, we should still at least : make that tradeoff with our eyes open. And if the tradeoff can be avoided, : that's even better. I refer you to http:://history.org/grandmother/teach/egg/suck. :-) That's a tradeoff I knowingly made in Perl 1. I saw that awk had a DFA implementation, and suffered for it as a useful tool. And it's not just the backrefs. An NFA can easily incorporate strategies such as Boyer-Moore, which actually skips looking at many of the characters, or the "scream" algorithm used by study, which can skip even more. All the DFAs I've seen have to look at every character, albeit only once. I suppose it's theoretically possible to compile to a Boyer-Moore-ish state machine, but I've never seen it done. Add to that the fact that most real-life patterns don't generally do much backtracking, because they're written to succeed, not to fail. This pattern never backtracks, for instance: my ($num) = /^Items: (\d+)/; I'm not against applying a DFA implementation where it's useful and practical, but just because it's the "best" in some limited theoretical framework doesn't cut it for me. Humans do a great deal of backtracking in real life, once the limits of their parallel pattern matching circuits are exceeded. Even in language we often have to reparse sentences that are garden pathological. Why should computers be exempt? :-) Larry