Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

Mark-Jason Dominus Mon, 11 Sep 2000 13:10:28 -0700

> As to your contention that "at best" (?r) will defeat many present
> optimizations, can you tell me why this will necessarily be so in the
> new engine? 

Let me explain my thinking along these lines.  I've made a number of
assumptions, which may not be correct, and certainly aren't obvious.

I have been supposing all along that the Perl 6 regex engine will
incorporate the Perl 5 regex engine directly.  This may turn out to be
wrong, but I did think it through.  I think this for several reasons:

1.  Writing even a simple regex engine is nontrivial.  Writing a regex
    engine as fast and as complicated as Perl's a very difficult.
    Even Perl's regex engine was not written from scratch; it was
    based on code supplied by Henry Spencer.

2.  Very few people are available who are capable of reimplementing
    Perl's regex engine.  The people on this list are clearly not
    going to do it.  According to someone on this list, some of the
    people here are not even competent to look at the regex engine
    code.

    More to the point, I don't know of anyone who has volunteered, and
    when I try to think of candidates, nobody likely comes to mind.

3.  Regexes are one of Perl's most essential features.  If the regexes
    are slow, that is a big problem for Perl.  The existing regex
    engine is fast, partly because it has years of optimizations in
    it.  To start over would be to throw that all away.

4.  People have tried implementing regex engines along different
    principles before, and have not been able to find anything faster
    than the current strategy. 

    For example, in Perl regexes are compiled into fixed-size
    bytecodes; when a regex (such as /a(b|c)d/) contains a branch, the
    branch is expressed as a bytecode offset.

    It might seem that one could do better: Instead of using
    bixed-size bytecodes, compile each regex operator as a C structure
    with a pointer to the struct for the next opcode.  A branch
    operator will have pointers to two other structures, instead of to
    only one.

    People have tried this more than once.  It turns out that this is
    slower than the bytecode approach.

5.  Larry has already said that he expects that much of the initial
    Perl 6 code will actually be Perl 5 code, just as much of the
    initial Perlk 5 code was actually Perl 4 code.  (See
    http://www.mail-archive.com/perl6-language@perl.org/msg01194.html) 

Perhaps the Perl 6 engine will be a fresh reimplementation, but I do
not think that that is very likely, because there is no good reason to
do it and because it does not appear that there is anyone available
and qualified who wants to do it.

Even if the Perl 6 engine *is* a fresh reimplementation, it seems
likely that it will operate on the same principles as the Perl 5
engine.

So I have been supposing that the Perl 6 regex engine will probably
not be rewritten from scratch, and if it *is* rewritten from scratch,
it will probably still look a lot like the Perl 5 regex engine.

As I said, this might be mistaken, but I think that it's the way to bet.
Re: RFC 72 (v1) The regexp engine should go backward as well as forward.

Reply via email to