The short story  
---------------

I've just committed a new, rewritten version of PGE to the Parrot 
repository.  It's still somewhat preliminary and many rule features 
are still missing.  On the other hand, it's now written entirely
in PIR (no more C compiling!) and provides a stronger base platform
for developing the rest of the features.  More details are below,
and in compilers/pge/README.  Comments, suggestions, and test cases
welcomed.

The long story
--------------

(Or, "hey, what took you so long?")  Well, I was working along in
PGE late last year, and discovered that P6 rules capturing semantics
weren't very clear.  So, a few messages to p6l and a few weeks later
the @Larry team proposes detailed semantics for capturing.  (I'll
present these in future messages.)  Then Real Life intruded and 
kept me away from development for a few weeks.

When I finally got a chance to look at PGE in detail again, I started 
adding things in the C parser and discovered that I really wanted dynamic
hashes and arrays to work with.  Since we're going to have to handle
Unicode someday anyway, I decided to take a step backwards (rebuild
the basic engine from scratch) so that we can start taking larger leaps
forward.  This divorces PGE from compiler issues, and which brings us 
to where we are today.

Appearing in this version of PGE:

  - basic expressions, quantifiers, alternation
  - subpatterns (groups), including nested capturing subpatterns
  - \d, \s, \n, \w character classes

Not yet implemented, but coming soon (rough priority order):

  - updated test harness/test suite
  - cut operations don't always work properly
  - subrules
  - character classes
  - interpolated variables
  - conjunctive matches
  - capture aliases
  - many, many potential optimizations

The basic design remains the same, there's a rule parser, an expression
tree, and a code generator.  I briefly played with executing components
of the expression tree to match strings but decided that it was actually
a bit messier than doing code generation (and likely a bit slower as we
had to test expression characteristics at pattern-match time instead of
compile time).  So, the difference now is that we have PIR code generating
PIR instead of a C function generating PIR.

The code generator tries to produce only the code needed for each
component of the match, but I'm sure there will be a lot of optimizations
we can come up with.  Fortunately we can optimize in the expression
tree before generating the code.

I also continued using the bsr/ret scheme for calling components
because it avoids much of the register passing overhead and recreating
state information from one expression to the next.  But it's easy
enough for us to switch to use Parrot subroutine semantics if we
ever decide we want that.

The test suite is currently broken, I'll be fixing that next.
As always we need more tests; the current tests are in t/p6rules
and are fairly easy to write.

Test, patches, comments, questions, etc. welcomed.  Questions/comments
about PGE installation and parrot issues probably belong on 
perl6-internals, modification and questions about PGE execution
and internals probably go on perl6-compiler.  Unless I hear otherwise
I will probably announce minor changes only to perl6-compiler.

Pm

Reply via email to