Re: Google Summer of Code 2007 : Improve regexp performance

Nikolai Weibull Thu, 15 Mar 2007 14:35:24 -0800

On 3/15/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote:

Hi all,


I went through the above idea presented for google summer of code 2007
and found it interesting. In my opinion, incorporating Thompson NFA into
regexp from the ground up is pretty cool. I also went through the
alternate TRE (http://laurikari.net/tre/index.html) but couldn't verify
whether its an implementation of Thompson NFA ( I will have to dig into
it's source and find it out ). I would like to know further details (if
any) regarding this project and the possible mentor(s).


It has its own finite automata, which is based on a Thompson NFA,
adding so called /tags/ in the mix for doing subgroup matching and
fuzzy matching.

It comes with three matchers/drivers: A backtracking one for when the
pattern requires it (backrefences), a multithreaded/parallel one, that
is, the standard two tables of active states one, and one for fuzzy
matching (approximate matching).

There's a nice paper about it as well (too actually, if you count the
first one he wrote about his first implementation for .

I actually wrote a simplification of his library, removing the
approximate matching stuff, as part of my master's, which is well
documented.  I still haven't had time to put up the PDF, though.

Sadly, both his and my implementation suffer from a bug in the
subgroup matching of "opposing" repetitions, which would have to be
fixed.  Neither of us have been able to do so yet.

Anyway, it would take an immense amount of work to turn Vim onto a new
regex implementation.  Vim has a whole range of its own stuff, like
matching cursor positions and so on, and is tightly bound to the
buffer implementation with its memlines and whatnot.  Not to
dishearten you, but I don't think this is a project that can be
completed over a summer (not that it has to be, but you may want to
keep that in mind).

Also, there's a TDFA (tagged, deterministic finite automaton)
implementation written in Haskell

http://www.mail-archive.com/glasgow-haskell-users@haskell.org/msg11442.html

Quite cool.

 nikolai

Re: Google Summer of Code 2007 : Improve regexp performance

Reply via email to