On Monday 09 September 2002 12:27 pm, Matthew Donadio wrote: > Lamar Owen wrote: > > Of course, if the existing one is DFA, and the replacement is NFA, > > then some regexes may break in subtle ways.... See the O'Reilly regex > > book for details on how deterministic finite automatons and > > non-deterministic finite automatons differ from the point of view of > > crafting regexes.
> Can you elaborate on this? I know the theory behind this, and am a bit > confused by the statment. It is pretty straightforward to convert a DFA > into an RE. The textbook method for converting an RE into a DFA is RE > -> NFA+e moves -> NFA -> DFA. You then can perform minimization of the > DFA to get the optimum one. But that's on the engine side. I'm referring to the difference between crafting the regex itself. Things such as 'Is alternation greedy' are dependent upon whether the regex engine is DFA, traditional NFA, or POSIX NFA. Plus, the exact form of the regex matters more with an NFA engine, thanks to its regex-directed nature -- you have 'wiggle' room in crafting your regexes. To the text-driven DFA, the exact from of the regex isn't nearly as important. The presence of backreferences pretty well tags the engine as being NFA. The Perl engine is decidedly NFA. Some might even say it is irregular.... When searching Bible phrases, the difference between the NFA's matching behavior and the DFA's 'longest leftmost' behavior might cause confusion in search results. That is, the same regex fed to an NFA might find a different set of matches than the same regex fed to a DFA engine. Of course, the POSIX NFA matches longest leftmost, too, further confusing the issue. For simple searching, where backreferences aren't important, a DFA might be a better way to go. AFAIK, a DFA engine can be found in original awk, Kernighan's nawk, and a few versions of GNU awk, as well as Aho's original egrep -- oddly enough, original grep was an NFA. But GNU grep is mostly DFA. So, is the existing GNU regex library DFA or NFA? (I don't know -- haven't looked at the code). If DFA, we would be subbing the Perl engine, which, due to the feature set Perl needs, is NFA. Summary: For the DFA, regex 'crafting' isn't so important -- but for an NFA proper regex crafting can mean substantial speed differences. For more information, read 'Mastering Regular Expressions' by Jeffrey E F Friedl from O'Reilly. And I know my usage of these terms (as well as Mr. Friedl's) is somewhat irregular, since the NFA and DFA concepts are used for much more than regular expressions. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11