On 3/20/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote:
I went through the regxp code and have a few questions... First, Why use this kind of a coding scheme and encode patterns rather than using a state diagram ? ( Performance/Memory ? ).
Because that's how Henry Spencer did it. I don't know his reasoning, though. It's probably a case of trying to limit memory use and avoiding a conceptually more complex data structure, but not exactly sure. You can find his original sources here: http://arglist.com/regex/
Secondly, is it a requirement that the new implementation has to follow the same method ? I mean, can't I use a state diagram ( which is easy to implement in my opinion ) to simulate the NFA ?
I thought the point was that you can use whatever method you want? And that you might use an already existing library, like TRE. In TRE 0.7.2 (I think - might have been 0.7.0), Ville added support for more general input. This means that you can provide an object that feeds the regex matcher with input. In Vim's case that would be feeding it the contents of successive memlines until a match is found or the buffer is "depleted". That should be a quite simple way of using TRE in Vim. The main problem with using TRE, however, is that it only works with "ASCII" or wide character input. That means that it isn't well-suited for Vim's internal buffer format of just keeping bytes around in whatever encoding they may be. To use TRE you'd have to transcode the buffer's bytes to wchar_t's (wide characters that are hopefully interpreted as Unicode characters by the standard C library on the system we're running on), depending on the buffers encoding. Perhaps working on a Vim-specific matcher for TRE would be valuable. I would suggest that you subscribe to the TRE mailing list. I haven't asked Ville what editor he prefers, but considering his indentation style, it's probably Emacs :-(. But perhaps he's willing to help us out anyway? http://laurikari.net/mailman/listinfo/tre-general nikolai