Re: Improving regexp performance

Nikolai Weibull Sun, 18 Mar 2007 05:07:46 -0800

On 3/18/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote:

I read several articles on possible NFA implementations ( took some time
to understand ). But I think implementing an NFA based approach is quite
possible ( Russ Cox has provided a sample implementation in his
article ). But I'm having a hard time understanding the existing
approach taken :'( .


The current implementation is also an NFA-based one.  What's more
interesting is how the NFA is simulated.  The current implementation
uses backtracking, that is, it tries a solution and checks if it
works.  If so, then accept.  Otherwise, try the next one.

Then there's the parallel one that keeps track of all the states that
we're currently in that accepts as soon as an accepting/final state is
reached.

And then there's the cached DFA implementation that builds the DFA
representation of the NFA as needed while traversing the input.  This
is the most difficult one to implement, but also the fastest for most
inputs.

Russ Cox's article is good, but doesn't show anyting new, even though
it has been received like as it would.  And he really skims over some
of the important issues that one has to consider when using regular
expressions for doing anything other than just accepting or rejecting
input as being part of a regular language.  Perhaps most important is
submatch reporting, which causes all kinds of difficulties that he
simply doesn't mention.

 nikolai

Re: Improving regexp performance

Reply via email to