Re: Summer of Code: Regexp

Ian Young Sun, 30 Mar 2008 15:54:52 -0700

Sorry to get back to you so late - here's what I can offer:

As far as I'm aware, the code in the vim71-ian branch of the
repository contains almost all of the stable work done by both myself
and Xiaozhou, so that's the best place to look.  There's a bunch of
testing code in that branch as well, but it isn't all documented
(sorry).  The tools I've been using are vgrep, regtest, and the
run_tests shell script (found in reg_test/).  Xiaozhou also wrote up a
test file for use with 'make test', but I'm not well acquainted with
its contents.

On Fri, Mar 28, 2008 at 5:53 AM, Andrei Aiordachioaie
<[EMAIL PROTECTED]> wrote:
>
>  From what I've looked at the test-cases, it seems that the NFA
>  implementation is not greedy, as it should be. I will look more into
>  it.
It's greedy in its own way: IIRC, leftmost-first, with the exception
of ordered alternation (see
http://groups.google.com/group/vim_dev/browse_thread/thread/9db490f9c4297c8e
for a discussion of that feature).

>  So for the project, I want to extend the test-suite to compare the way
>  regexps are handled in the old vs the new engine. Maybe this uncovers
>  other bugs. Then, the largest portion of the project would be fixing
>  the found bugs. And if that takes little time, I could work on the old
>  regexp engine bugs.

The largest batch of test cases is in reg_test/files/basic.dat, which
can be run with "./regtest --engine=nfa reg_test/files/basic.dat".

This file has been modified so all tests succeed with the old vim
matching engine.  So the failures there represent the differences
between the old and new engines.  The --engine=[nfa,bt] flag on
regtest and vgrep control which engine is used, so you can compare
easily.  There are a few lingering bugs to be ironed out, but it seems
like we're pretty close to a correct engine - more of the work will
probably go into making it faster.

>  Do you have any other ideas? Would this be enough
>  for a 2.5 months project?

Here's what I wrote to another student who enquired about the project:

"The short answer is yes, there's more work to be done by another
student.  I've been slowly working on fixing a few lingering problems
in the code we wrote last summer (thus the commits you saw).  The code
is very close to running correctly. However, it's not super fast at
this point, so one big project might be optimizing the new code so
that it is more comparable to the speed of the old engine on
non-pathological cases.  There are also some more features that would
be great to add (off the top of my head, a couple are multibyte
characters and the \{n,m} construct).  And of course, there's a
non-trivial amount of work in just preparing the code for inclusion in
Vim's source.  I just haven't found the time this semester to do as
much as I had hoped, so again, yes, I think another summer on this
project would prove fruitful.  If you'd like a better idea of where
development left off, I suggest poking through the archives of the
group we used at <http://groups.google.com/group/vim-soc-regexp>.  The
last couple commits I've made are not yet documented, so don't worry
too much about those for the moment."

Hope all this helps,
Ian

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: Summer of Code: Regexp

Raspunde prin e-mail lui