Re: Understanding regxp implementation

Nikolai Weibull Thu, 22 Mar 2007 00:26:19 -0800

On 3/22/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote:

As you might know, the reg_comp() method is called twice when compiling
a r.e; first to determine the size of the compiled expression and then
to actually compile it. I was thinking if this can be used to our
advantage, while on the first pass, we look for occurrences of special
characters and set a flag in regprog_T appropriately. If such thing was
not found, we branch off the second pass into one of our own routines to
compile the expression into our own structures (say, a state diagram).
And we have to change other functions a bit to look for the above flag
and call new routines appropriately. What do you think ?


That sounds like a good way of determining whether the old engine will
be required or if a new one (with more "limited" functionality) should
be used.  One way of keeping this information as local as possible
would be to keep a set of function pointers with the compiled regex
that point to the appropriate functions to execute them on some input.

For example, you could have something like this:

typedef struct
{
   int (*exec)();
   int                  regstart;
   char_u               reganch;
   char_u               *regmust;
   int                  regmlen;
   unsigned             regflags;
   char_u               reghasz;
   char_u               program[1];             /* actually longer.. */
} regprog_T;

and change vim_regexec() to call the exec() function of the regprog_T
in the regmatch_T that it gets passed.  You'd then set exec() to point
to either vim_old_regexec() or vim_new_regexec() (or similarly named
functions) in vim_regcomp() depending on the type of regex we have.  I
guess it could be some flag field as well, but this makes it possible
to add a third matcher, should we so desire...like a
Boyer-Moore-Horspool matcher for fixed strings.

 nikolai

Re: Understanding regxp implementation

Reply via email to