On 3/20/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote:

I went through the regxp code and have a few questions...

First, Why use this kind of a coding scheme and encode patterns rather
than using a state diagram ? ( Performance/Memory ? ).

Because that's how Henry Spencer did it.  I don't know his reasoning,
though.  It's probably a case of trying to limit memory use and
avoiding a conceptually more complex data structure, but not exactly
sure.  You can find his original sources here:

http://arglist.com/regex/

Secondly, is it a
requirement that the new implementation has to follow the same method ?
I mean, can't I use a state diagram ( which is easy to implement in my
opinion ) to simulate the NFA ?

I thought the point was that you can use whatever method you want?
And that you might use an already existing library, like TRE.

In TRE 0.7.2 (I think - might have been 0.7.0), Ville added support
for more general input.  This means that you can provide an object
that feeds the regex matcher with input.  In Vim's case that would be
feeding it the contents of successive memlines until a match is found
or the buffer is "depleted".  That should be a quite simple way of
using TRE in Vim.

The main problem with using TRE, however, is that it only works with
"ASCII" or wide character input.  That means that it isn't well-suited
for Vim's internal buffer format of just keeping bytes around in
whatever encoding they may be.  To use TRE you'd have to transcode the
buffer's bytes to wchar_t's (wide characters that are hopefully
interpreted as Unicode characters by the standard C library on the
system we're running on), depending on the buffers encoding.

Perhaps working on a Vim-specific matcher for TRE would be valuable.

I would suggest that you subscribe to the TRE mailing list.  I haven't
asked Ville what editor he prefers, but considering his indentation
style, it's probably Emacs :-(.  But perhaps he's willing to help us
out anyway?

http://laurikari.net/mailman/listinfo/tre-general

 nikolai

Reply via email to