> As to your contention that "at best" (?r) will defeat many present
> optimizations, can you tell me why this will necessarily be so in the
> new engine?
Let me explain my thinking along these lines. I've made a number of
assumptions, which may not be correct, and certainly aren't obvious.
I have been supposing all along that the Perl 6 regex engine will
incorporate the Perl 5 regex engine directly. This may turn out to be
wrong, but I did think it through. I think this for several reasons:
1. Writing even a simple regex engine is nontrivial. Writing a regex
engine as fast and as complicated as Perl's a very difficult.
Even Perl's regex engine was not written from scratch; it was
based on code supplied by Henry Spencer.
2. Very few people are available who are capable of reimplementing
Perl's regex engine. The people on this list are clearly not
going to do it. According to someone on this list, some of the
people here are not even competent to look at the regex engine
code.
More to the point, I don't know of anyone who has volunteered, and
when I try to think of candidates, nobody likely comes to mind.
3. Regexes are one of Perl's most essential features. If the regexes
are slow, that is a big problem for Perl. The existing regex
engine is fast, partly because it has years of optimizations in
it. To start over would be to throw that all away.
4. People have tried implementing regex engines along different
principles before, and have not been able to find anything faster
than the current strategy.
For example, in Perl regexes are compiled into fixed-size
bytecodes; when a regex (such as /a(b|c)d/) contains a branch, the
branch is expressed as a bytecode offset.
It might seem that one could do better: Instead of using
bixed-size bytecodes, compile each regex operator as a C structure
with a pointer to the struct for the next opcode. A branch
operator will have pointers to two other structures, instead of to
only one.
People have tried this more than once. It turns out that this is
slower than the bytecode approach.
5. Larry has already said that he expects that much of the initial
Perl 6 code will actually be Perl 5 code, just as much of the
initial Perlk 5 code was actually Perl 4 code. (See
http://www.mail-archive.com/perl6-language@perl.org/msg01194.html)
Perhaps the Perl 6 engine will be a fresh reimplementation, but I do
not think that that is very likely, because there is no good reason to
do it and because it does not appear that there is anyone available
and qualified who wants to do it.
Even if the Perl 6 engine *is* a fresh reimplementation, it seems
likely that it will operate on the same principles as the Perl 5
engine.
So I have been supposing that the Perl 6 regex engine will probably
not be rewritten from scratch, and if it *is* rewritten from scratch,
it will probably still look a lot like the Perl 5 regex engine.
As I said, this might be mistaken, but I think that it's the way to bet.