The short story --------------- I've just committed a new, rewritten version of PGE to the Parrot repository. It's still somewhat preliminary and many rule features are still missing. On the other hand, it's now written entirely in PIR (no more C compiling!) and provides a stronger base platform for developing the rest of the features. More details are below, and in compilers/pge/README. Comments, suggestions, and test cases welcomed.
The long story -------------- (Or, "hey, what took you so long?") Well, I was working along in PGE late last year, and discovered that P6 rules capturing semantics weren't very clear. So, a few messages to p6l and a few weeks later the @Larry team proposes detailed semantics for capturing. (I'll present these in future messages.) Then Real Life intruded and kept me away from development for a few weeks. When I finally got a chance to look at PGE in detail again, I started adding things in the C parser and discovered that I really wanted dynamic hashes and arrays to work with. Since we're going to have to handle Unicode someday anyway, I decided to take a step backwards (rebuild the basic engine from scratch) so that we can start taking larger leaps forward. This divorces PGE from compiler issues, and which brings us to where we are today. Appearing in this version of PGE: - basic expressions, quantifiers, alternation - subpatterns (groups), including nested capturing subpatterns - \d, \s, \n, \w character classes Not yet implemented, but coming soon (rough priority order): - updated test harness/test suite - cut operations don't always work properly - subrules - character classes - interpolated variables - conjunctive matches - capture aliases - many, many potential optimizations The basic design remains the same, there's a rule parser, an expression tree, and a code generator. I briefly played with executing components of the expression tree to match strings but decided that it was actually a bit messier than doing code generation (and likely a bit slower as we had to test expression characteristics at pattern-match time instead of compile time). So, the difference now is that we have PIR code generating PIR instead of a C function generating PIR. The code generator tries to produce only the code needed for each component of the match, but I'm sure there will be a lot of optimizations we can come up with. Fortunately we can optimize in the expression tree before generating the code. I also continued using the bsr/ret scheme for calling components because it avoids much of the register passing overhead and recreating state information from one expression to the next. But it's easy enough for us to switch to use Parrot subroutine semantics if we ever decide we want that. The test suite is currently broken, I'll be fixing that next. As always we need more tests; the current tests are in t/p6rules and are fairly easy to write. Test, patches, comments, questions, etc. welcomed. Questions/comments about PGE installation and parrot issues probably belong on perl6-internals, modification and questions about PGE execution and internals probably go on perl6-compiler. Unless I hear otherwise I will probably announce minor changes only to perl6-compiler. Pm