exp <- digit '*' digit EOF -> {(lambda (x y) (* x y))} This looks as step backwards How it is different from old yacc exp <- digit '*' digit EOF -> {$$ = (* $1 $3)}
My approach is different. I made tree structure explicit in rule by binding as in example tree = '(' number:value tree?:left tree?:rigth ')' result(Tree) result(Tree) creates Tree object and sets value,left,rigth fields to corresponding values On Fri, Dec 10, 2010 at 01:47:59PM +1300, Peter Cashin wrote: > Mathias: > A good plan to separate out the: > - grammar operators > - parse tree building > - parser action code > I agree with you, but in my case I want to have the grammar language > totally independent of the implementation programming language. > So the grammar has no parser action code, although this can be added later > in a particular programming language implementation. Parboiled has much > closer integration with the programming language, which has advantages, > but I really want grammar specifications that are totally independent. > So now for the grammar operators, and parse tree building annotations. For > many years I kept these separate: tree building annotations in the rule > head (ie they annotate the rule name) or definition syntax (= for interior > nodes, : for leaf nodes or terminals). The right hand side of the rule had > the grammar expression body withs grammar operations only. One more head > notation allows pruning the tree. > This works fine, and maybe I should have stuck to that separation, but I > introduced the `x prefix notation because I found that it was an advantage > to be able to see the children nodes that would be generated by looking at > the parent rule alone, and not having to refer to the children rule > definitions to see how they are annotated. > I have found that about half of many grammars turn out to be leaf nodes, > so the ":" rule definition notation has a huge payoff in tree size and > performance. > Because of this the `x rule is not vital, but over time I found it was > convenient, and a good trade-off for me. > Cheers, > Peter. > > On Fri, Dec 10, 2010 at 11:44 AM, Mathias <[1]math...@parboiled.org> > wrote: > > Over the course of developing parboiled (PEG parsing engine for Java and > Scala using internals DSLs for grammar specification) I found that it's > best to clearly separate the following three things: > - grammar operators > - parse tree building > - parser action code > > The operators, together with the rules they are applied to, define the > language your grammar recognizes. Nothing more, nothing less. > Whether your engine builds a parse tree for input of that language or > whether it doesn't should likely not be specified at the operator level, > but it some other way. > In "parboiled for Java" operators and rules are modeled as method calls > whereas parse tree building is controlled via annotations on those > methods. You can selectively enable or disable the creation of parse > tree nodes per grammar rule, which allows you to tweak parse tree > building exactly to your needs. > > The interface between the parsing engine and custom parser action code > is the thing that has changed the most over the course of parboileds > life cycle so far. > Initially parser action code had to access the parse tree nodes in order > to get to matched input. Creating a custom object structure during the > parsing run (e.g. an AST) was done by decorating the parse tree. So the > parse tree was the "work bench" of the parsing process. > Over time this heavy centering around the parse tree turned out to have > two problems: > > 1. Low Performance. > Always having to create a parse tree even when all you care about is the > AST is just wasteful. Especially since the parse tree can contain a huge > number of nodes for larger inputs (sometimes more nodes than input > characters). > > 2. Less room for automatic optimizations. > The structure of the parse tree is dictated by the structure of the > grammar rules. If your parser action code is built under the assumption > that the parse tree has a given structure there is little leeway for the > parsing engine to apply automatic rule optimizations. Decoupling the > parser action code from the parse tree opens up the possibility to apply > all kinds of automatic grammar tweaking before running the parser the > first time. The engine might decide to completely change the rule > structure, as long as all changes do not change the recognized language > and are transparent to the parser action code. > > Currently parboiled implements the interface between the parsing engine > and the action code in the following way: > 1. Actions can appear anywhere in a rule. > 2. Actions can access the matched input text of the sub rule immediately > preceding the action but not of any other rule (so there is no sub rule > labeling required). > 3. For working with custom objects (e.g. AST nodes) the engine provides > a "Value Stack", which is a simple stack structure that serves as a fast > work bench for a parsing run. Actions can push objects onto this stack, > pop them off, swap them around, and so on. > > This solution completely decouples the parse tree from everything else. > You can enable or disable parse tree building without any effect on the > rest of the parser. There is no need for addressing sub rules in action > expressions and given a somewhat efficient value stack implementation > the whole thing is quite fast. Additionally, in "parboiled for Scala", > the action code with its manipulations of the value stack can be > statically type-checked at compile time, which is a huge plus. > > In case you are interested in more details or broader explanations, the > parboiled documentation is quite complete. > > Cheers, > Mathias > > --- > [2]math...@parboiled.org > [3]http://www.parboiled.org > On 09.12.2010, at 21:01, Alan Post wrote: > > > I'm working on my PEG parser, in particular the interface between > > the parse tree and the code one can attach to productions that > > are executed on a successful parse. > > > > I've arranged for the two predicate operations, & and !, to not add > > any output to the parse tree. �That means that the following > > production: > > > > �rule <- &a !b "c" > > > > Produces the same parse tree as: > > > > �rule <- "c" > > > > Internally, this means that I recognize that the sequence operator > > (which contains the productions '&a', '!b', and '"c"' in this > > example) is being called with predicates in every position but one, > > and rather than returning a list containing that single element, > > I return just the single element. > > > > As I've been doing this, I've found that I want a new operator similar > > to '&'. �'&' matches the production it is attached to, but it does not > > advance the position of the input buffer. > > > > I'd like an operator that matches the production it is attached to, > > advances the input buffer, but doesn't add anything to the parse > > tree. > > > > Here's an example: > > > > �mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))} > > > > the mulexp production is a sequence of four other rules, but only > > two of them are needed by the associated code. �It would be nice > > if I could write the code rule like it is above, rather than say > > this: > > > > �(lambda (x op y EOF) (* x y)) > > > > Having to account for all the rules in the sequence, but really > > only caring about two of them. �Here is the example rewritten > > with '^' expressing "match the rule, advance the input, but don't > > modify the parse tree": > > > > �mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))} > > > > Before I go inventing syntax for this use case, will you tell me if > > this is already being done with other parsers? �Have any of you had > > this problem and already solved it, and if so, what approach did you > > take? > > > > -Alan > > -- > > .i ko djuno fi le do sevzi > > > > _______________________________________________ > > PEG mailing list > > [4]...@lists.csail.mit.edu > > [5]https://lists.csail.mit.edu/mailman/listinfo/peg > > _______________________________________________ > PEG mailing list > [6]...@lists.csail.mit.edu > [7]https://lists.csail.mit.edu/mailman/listinfo/peg > > References > > Visible links > 1. mailto:math...@parboiled.org > 2. mailto:math...@parboiled.org > 3. http://www.parboiled.org/ > 4. mailto:PEG@lists.csail.mit.edu > 5. https://lists.csail.mit.edu/mailman/listinfo/peg > 6. mailto:PEG@lists.csail.mit.edu > 7. https://lists.csail.mit.edu/mailman/listinfo/peg > _______________________________________________ > PEG mailing list > PEG@lists.csail.mit.edu > https://lists.csail.mit.edu/mailman/listinfo/peg -- Plasma conduit breach _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg