Significant internal updates to PGE

Patrick R. Michaud Fri, 14 Oct 2005 16:33:42 -0700

I've just checked in (r9843) a new version of PGE (the grammar engine)
with some substantial changes to its internal calling sequences and
data structures.  For those who are using PGE according to its
defined interfaces things work largely the same -- anyone who is
developing for PGE or making use of PGE's internals may see some
differences as described below.


The biggest difference is that single-element captures in Match objects
are now internally represented with the same structure as seen by the
"outside world".  For example, with an expression like

    rule = p6rule(":w (mv) [ (\w+)]*")

$/[0] (aka $0) ends up with a single Match object, while $/[1] (aka $1)
is an array of Match objects because of the "*" quantifier.

In previous versions of PGE, the PGE::Match class internally stored
all captures (quantified or not) in arrays, and used an "isarray"
property on the array to indicate if it was to act as a single
Match object or an array of Match objects.

In the version I've just checked in, the "isarray" property is gone,
and the $0, $1, $2, ... captures are stored internally as single
Match objects (unquantified) or arrays as specified by the rule.
In particular, this means one can now use the "get_array" and
"get_hash" methods on Match objects and get exactly the correct
structure.

Other key differences in this new version:

- PGE's internal rule calling conventions (e.g., to PIR-coded 
  subrules such as <alpha>, <upper>, etc.) are now consistent
  with rules generated by PGE itself.  Thus, if one wants to
  call the <alpha> rule directly, it can be done with:

      .local pmc alpha
      alpha = find_global "PGE::Rule", "alpha"
      $P0 = alpha("Some string")

  and $P0 will be a Match object for the "S".  Note that many of
  PGE's built-in rules tend to act as if the :p modifier is
  set -- in this case anchoring the match to the beginning of
  the string.  

- The PIR code that PGE generates can now be stored externally
  and directly included by other PIR modules.  For example, when
  a previous version of PGE was loaded, the initialization code
  executed at load-time would dynamically compile and install
  <ident> and <name> subrules, thus slowing down program
  initialization.  In this new version, the PIR code for
  <ident> and <name> is generated as part of building PGE, so
  that PGE.pbc already contains the bytecode for these precompiled
  rules when it is loaded.

- PGE Match objects can now distinguish array keys from hash keys 
  that begin with a digit.  Previously Match objects assumed that
  any key starting with a digit was addressing solely the array
  component of the Match object.

- A number of performance enhancements and code cleanups, especially
  in the code that handles matching of quantified groups and
  subrules.

Questions, comments, feedback welcomed as always.  My next area
of focus is on providing subrules that can match quoted and bracketed
constructs (similar to Text::Balanced), and on completing a
shift/reduce parser that integrates with PGE's rule matching 
capabilities.

Pm

Significant internal updates to PGE

Reply via email to