Lowell Thomas wrote: > 1. I have removed all AST (and semantic action) information from the > grammar. The grammar is a pure statement of the language, nothing > more and nothing less. > 2. The nodes to appear on the AST are specified at run time just > before the parse. A special function ulParseAstInit( list) is called > with list being a list of true/false values, one for each rule name > (its even simpler in practice, but the details are unnecessary > here.) > 3. (Side Bar: all semantic actions are defined at run time in a > similar fashion.) > 4. Because each AST node captures the collected, concatenated phrase > that it matches, there is no need for keeping any interior or > terminal nodes. > 5. Suppose we specify Additive, Atom and Operator in the > ulParseAstInit list. The AST would look conceptually like this:
This seems to be a sweet spot for API design, as your description matches my API point for point and it seems several of us have converged towards similar approaches. > Speaking of debugging – this is another area I’ve put some effort > into on my latest version. For debugging I do a complete, real-time > print out of the entire syntax tree traversal, as it happens. This > includes all states (pre-branch, post-branch – match, empty, nomatch) > of all nodes (interior, non-terminal and terminal.) This includes > even those alternate branches that fail and never show up in the > parse tree. The problem with this is that it can run into hundreds of > thousands, even millions of lines of output. I have done something similar, by basically memoizing everything you try in a packrat-style parser and then just dumping the whole table. As you point out, this works best on small inputs. I've experimented with querying this full table in various ways, and I have taken a stab at an explainParseFailure() function with somewhat human-readable output, but it's all very primitive. The output is not very helpful to someone who doesn't already know in intimate detail how these parsers work. I'll be interested to look at your take on these ideas. I think generating acceptable error messages from a parser alone is an interesting hard problem. It might be possible to do some statistical analysis on a corpus of valid inputs and then derive heuristics to suggestwhat the most likely error in the input string might be. Regards, -- Michaeljohn Clement http://inimino.org/~inimino/blog/ _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg