So this is a bigger email than I realized and so I decided to make a new thread of it.
---------------------- The semantics of explicit tail-calls in a production are somewhat more complicated than I originally thought with respect to backtracking and cascading. For the time being I have settled on doing an optimized raise children '^' operation by not giving tail-called production rules new parse trees but instead giving them their caller's parse tree pointer to add their children to, thus eliminating any cascaded and inefficient raise tree operators. Here are some interesting things that I have been thinking of as I continue to work on my TPDL parser with the extended AST-building operators -, ^, and the automatic raise children rule. Suppose the following notation is allowed: in a production rule phrase, a '[' ... ']' can appear which signifies an inline definition of an anonymous production rule. An anonymous production rule can reference itself with $self. The effect of using an anonymous production rule is to make a specially named production rule in the grammar and call it in place of the anonymous rule. Also imagine a simpler macro system to C's preprocessor being used to add more semantic meaning to the TPDL. The effect of calling a macro is a direct text insertion into the grammar. Macros would be implemented by means of a pre-processor. Note: the two above additions to the grammar language are syntactic sugar and provide no more power as they simply represent text insertion into a grammar. Finally, I would add two things from Prolog: (fail) meta-production-rule to the grammar language that always fails and the cut '!' operator. ---------------------- Syntax and their meaning: juxtaposition/sequencing of symbols represents a conjunction of rules, in this context: concatenation of terminals and non-terminals. ! cut meta-production rule, after a cut, the production will commit to matching the current phrase and if it fails then the production rule fails and a cascading backtrack ensues. That is, no local backtracking within the application of the production rule occurs once a cut operator has been seen in the current phrase. (fail), fail meta-production rule, automatically fails to match. :, ordered choice, i.e. disjunction. -, non-excludable prefix operator, require a symbol (terminal or non-terminal) to appear as a node in the parse tree. ^, raise children prefix operator, raise branches of production rule into place of production rule in the current phrase in the parse tree. Raise children and non-excludable operator are mutually exclusive. <>, epsilon, match the empty string. This has an equivalent meaning to 'succeed'. <...>, terminal symbol, forgive me for not adding in string matching support into my parser just yet, I'm bootstrapping the parser from its simplest form at the moment. Foo, non-terminal symbol, represents the application of a production rule to the current position in the token stream [ ... ], anonymous production rule [ ... $self ... ], self-referencing anonymous production rule Foo(bar) = ..., macro definition Foo(...), macro call/expansion point Finally, terminals that do not have the non-excludable operator applied to them are not inserted into the parse tree. Non-terminals without the non-excludable and raise branch operator and with only one branch (child) are automatically replaced by their branch. ---------------------- We can now reason about PEG grammars in terms of our extended Prolog-like TPDL. The following macros can be defined: KleeneClosure(x) = ^[x ^$self : <>] PositiveClosure(x) = x KleeneClosure(x) Optional(x) = [x : <>] NotFollowedBy(x) = [x ! (fail) : <>] FollowedBy(x) = NotFollowedBy(NotFollowedBy(x)) This is cool! What can we say about this? First, ignoring the convenient macro and anonymous production rule syntax, the only additions to TPDL that need to be made to make it as powerful as PEG is the cut '!' operator and the (fail) meta-production rule. These operators are specifically useful to implement the NotFollowedBy and FollowedBy operators for PEGs as they are special insofar as they don't consume anything, or put another way: they backtrack over what they consumed. Second, in order to properly construct parse trees as PEGs do, the -, ^, operators and auto-raise rules are required in order to make lists of things. I would argue that these two operators and the rule actually give one more power than basic PEGs provide in terms of basic parse tree building. This argument, however, follows from the fact that at the moment I do not provide any rules to map the parse tree to the AST and instead work with a reduced parse tree as if it were an AST. Finally, the macros and anonymous production rules provide the syntactic sugar to make everything come together in a nice way without actually adding any complexity to the core of the parser (although their complexity is seen in the form of a pre-processor for a grammar). Again, this is given that my parser is an independent function+data structure that operators on a grammar data structure and so what actually performs the parsing is self-contained and distinct from the grammar and scanner. Thoughts? I really like the idea of adding the Prolog-like operators to the parser. Best Regards, Peter Goodman, http://ioreader.com 70 Winston Circle, Montreal, Quebec H9S 4X6 _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg