On Sun, 2007-03-18 at 17:32 +0100, Nikolai Weibull wrote: > On 3/18/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote: > > > On Sun, 2007-03-18 at 14:07 +0100, Nikolai Weibull wrote: > > > > Then there's the parallel one that keeps track of all the states that > > > we're currently in that accepts as soon as an accepting/final state is > > > reached. > > > I think by parallel you are referring to the usual mechanism for > > simulating the NFA ( on the fly subset construction ? ). Simply, we have > > a set of states ( initially S0 ) and the we keep eating a symbol at a > > time, calculate the next possible set of states ( including E-closure ) > > and whenever the current set of states contain an accepting state of the > > original NFA, Voila! > > Yes. Quite elegant, wouldn't you say? ;-) > > > > And then there's the cached DFA implementation that builds the DFA > > > representation of the NFA as needed while traversing the input. This > > > is the most difficult one to implement, but also the fastest for most > > > inputs. > > > Sorry, but I'm in confusion here. Are you telling me that the current > > regxp engine has a cached DFA implementation ? ( I thought such an > > implementation was the aim of this project :( ) > > No. The current implementation is purely backtracking, with a bunch > of tricks to speed it up. > > > > Russ Cox's article is good, but doesn't show anyting new, even though > > > it has been received like as it would. And he really skims over some > > > of the important issues that one has to consider when using regular > > > expressions for doing anything other than just accepting or rejecting > > > input as being part of a regular language. Perhaps most important is > > > submatch reporting, which causes all kinds of difficulties that he > > > simply doesn't mention. > > > > I agree. Just out of curiosity, I referred to the book "Compilers - > > Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi, Jeffrey > > D. Ullman". It has a very nice explanation of NFA, NFA->DFA, Regxp->NFA > > and even Regxp->DFA ( I'm still trying to understand the last one ). > > I used the article by Russ Cox to verify what I have learnt from the > > above book. But I believe that above article is a good starter. > > Yes, it's a very good and short (which is a good thing!) summary. The > book you mention above is also very good, and covers a bit more > theory. I can highly recommend the following books and articles: > > John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theo- > ry, Languages, and Computation. (Addison–Wesley, Reading, Massachusetts, > 1979). > > This is really the reference work if you care about the theory. > Contains a lot of information on formal languages, of which only the > regular ones are really of interest for this project. > > Gonzalo Navarro, NR-grep: A fast and flexible pattern matching tool , > Software—Practice and Experience 31(13), pp. 1265–1312 (2001). > > A very fast grep implementation. The algorithm used uses bitfields to > keep track of the finite automatons state, making it very fast for > many common patterns. > > Alfred V. Aho, Algorithms for Finding Patterns in Strings. In Algorithms > and Complexiy, vol. A of Handbook of Theoretical Computer Science, chapter > 5, pp. 255–300 (Elsevier Science Publishers B.V., Cambridge, Massachusetts, > 1990). > > This last one covers the theory especially well for this project, I'd > say, and contains a very complete bibliography. > > I hope you won't drown in information ;-). More pointers can be found > in my thesis, but those are probably the most interesting ones. > > > Thank you so much for the feed back. :) > > No problem. I'm very interested to see where this project will go and > I hope I can be of help, should you need it.
I definitely need your help! :) - Asiri > nikolai