Lowell Thomas wrote:
> 1. I have removed all AST (and semantic action) information from the 
> grammar. The grammar is a pure statement of the language, nothing
> more and nothing less.
> 2. The nodes to appear on the AST are specified at run time just
> before the parse. A special function ulParseAstInit( list) is called
> with list being a list of true/false values, one for each rule name
> (its even simpler in practice, but the details are unnecessary
> here.)
> 3. (Side Bar: all semantic actions are defined at run time in a
> similar fashion.)
> 4. Because each AST node captures the collected, concatenated phrase
> that it matches, there is no need for keeping any interior or
> terminal nodes.
> 5. Suppose we specify Additive, Atom and Operator in the
> ulParseAstInit list. The AST would look conceptually like this:

This seems to be a sweet spot for API design, as your description matches
my API point for point and it seems several of us have converged towards
similar approaches.

> Speaking of debugging – this is another area I’ve put some effort
> into on my latest version. For debugging I do a complete, real-time
> print out of the entire syntax tree traversal, as it happens. This
> includes all states (pre-branch, post-branch – match, empty, nomatch)
> of all nodes (interior, non-terminal and terminal.) This includes
> even those alternate branches that fail and never show up in the
> parse tree. The problem with this is that it can run into hundreds of
> thousands, even millions of lines of output.

I have done something similar, by basically memoizing everything you try
in a packrat-style parser and then just dumping the whole table.  As you
point out, this works best on small inputs.

I've experimented with querying this full table in various ways, and I
have taken a stab at an explainParseFailure() function with somewhat
human-readable output, but it's all very primitive.  The output is not
very helpful to someone who doesn't already know in intimate detail how
these parsers work.  I'll be interested to look at your take on these
ideas.

I think generating acceptable error messages from a parser alone is an
interesting hard problem.  It might be possible to do some statistical 
analysis on a corpus of valid inputs and then derive heuristics to 
suggestwhat the most likely error in the input string might be.

Regards,

-- 
Michaeljohn Clement
http://inimino.org/~inimino/blog/

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to