I intentionally elided my example to demonstrate the essence of my question. I support the following operations:
argument by position, with syntax to ignore nodes in the parse tree: expr <- digit `op digit -> {(lambda (x y) (* x y))} argument by keyword, if you want the calling function to have a signature not modelled after the parse tree (I've arbitrarily change the position of |right| and |left|, the caller can use the most natural ordering, rather than the order the parser is using): expr <- #:left digit `op #:right digit -> {(lambda (#!key right left) (* left right))} both arguments by position and arguments by keyword (notice the missing ` operator on op): expr <- #:left digit op #:right digit -> {(lambda (op #!key right left) ((eval (string->symbol (string op))) left right))} I don't require the code to be specified with the grammar: in parse_action.scm: (define (expr x y) (* x y)) in parse_grammar.scm: expr <- digit `op digit -> expr With you example, I could change all examples using '{}', like this one: expr <- #:left digit op #:right digit -> {(lambda (op #!key right left) ((eval (string->symbol (string op))) left right))} To generate the lambda procedure signature, shortening it to: expr <- #:left digit op #:right digit -> {((eval (string->symbol (string op))) left right)} Which is a very much a good suggestion. I've added this to my backlog for this project. Do you see anything else in these examples that you would do differently? Syntactically, I'm using the Racket-style #!key parameters (this is '#:left digit' vs. your example 'digit:left' and R5RS Scheme style strings ("foo\nbar") and characters (#\a or #\space), as that is my target language. I consider those localization or style issues. I very much appreciate comments on that decision, and I also appreciate comments on the design aspect of the interface between action and grammar. Unlike parboiled, I don't have any explicit parse tree construction and manipulation. I do have flexibility in the calling interface between the grammar and the parser actions, which is working on the same problem. I have insufficient data to know whether I'll be developing a more explicit parse tree API as I continue to develop this parser, but the data I do have (from Mathias) has suggested it will be a natural extension of building a parser generator. -Alan On Sat, Dec 11, 2010 at 07:24:28PM +0100, Ondřej Bílka wrote: > Sorry I was replaying to two things at once. My objectinion about your > example was first thing. > My approach how do that is something like > exp <- digit:x '*' digit:y EOF -> { (* x y)} > Then I started replaying to parboiled that we don't need build parse tree as > syntax tree suffices. > Also in my peg you can refer to subrule used only once by its name. > On Sat, Dec 11, 2010 at 10:14:36AM -0700, .alyn.post. wrote: > > Will you explain in this example what you do with the tree object > > afterwards? > > > > In the example I gave, the production is converted to a number. > > > > I think in your example, you're showing how to convert a parse tree > > to a syntax tree, and I assume you later operator over the syntax > > tree? > > > > Thank you, > > > > -Alan > > > > On Sat, Dec 11, 2010 at 10:46:40AM +0100, Ondřej Bílka wrote: > > > exp <- digit '*' digit EOF -> {(lambda (x y) (* x y))} > > > This looks as step backwards > > > How it is different from old yacc > > > exp <- digit '*' digit EOF -> {$$ = (* $1 $3)} > > > > > > My approach is different. I made tree structure explicit in rule by > > > binding as > > > in example > > > > > > tree = '(' number:value tree?:left tree?:rigth ')' result(Tree) > > > > > > result(Tree) creates Tree object and sets value,left,rigth fields to > > > corresponding values > > > > > > On Fri, Dec 10, 2010 at 01:47:59PM +1300, Peter Cashin wrote: > > > > Mathias: > > > > A good plan to separate out the: > > > > - grammar operators > > > > - parse tree building > > > > - parser action code > > > > I agree with you, but in my case I want to have the grammar language > > > > totally independent of the implementation programming language. > > > > So the grammar has no parser action code, although this can be added > > > > later > > > > in a particular programming language implementation. Parboiled has > > > > much > > > > closer integration with the programming language, which has > > > > advantages, > > > > but I really want grammar specifications that are totally > > > > independent. > > > > So now for the grammar operators, and parse tree building > > > > annotations. For > > > > many years I kept these separate: tree building annotations in the > > > > rule > > > > head (ie they annotate the rule name) or definition syntax (= for > > > > interior > > > > nodes, : for leaf nodes or terminals). The right hand side of the > > > > rule had > > > > the grammar expression body withs grammar operations only. One more > > > > head > > > > notation allows pruning the tree. > > > > This works fine, and maybe I should have stuck to that separation, > > > > but I > > > > introduced the `x prefix notation because I found that it was an > > > > advantage > > > > to be able to see the children nodes that would be generated by > > > > looking at > > > > the parent rule alone, and not having to refer to the children rule > > > > definitions to see how they are annotated. > > > > I have found that about half of many grammars turn out to be leaf > > > > nodes, > > > > so the ":" rule definition notation has a huge payoff in tree size > > > > and > > > > performance. > > > > Because of this the `x rule is not vital, but over time I found it > > > > was > > > > convenient, and a good trade-off for me. > > > > Cheers, > > > > Peter. > > > > > > > > On Fri, Dec 10, 2010 at 11:44 AM, Mathias <[1]math...@parboiled.org> > > > > wrote: > > > > > > > > Over the course of developing parboiled (PEG parsing engine for > > > > Java and > > > > Scala using internals DSLs for grammar specification) I found that > > > > it's > > > > best to clearly separate the following three things: > > > > - grammar operators > > > > - parse tree building > > > > - parser action code > > > > > > > > The operators, together with the rules they are applied to, define > > > > the > > > > language your grammar recognizes. Nothing more, nothing less. > > > > Whether your engine builds a parse tree for input of that language > > > > or > > > > whether it doesn't should likely not be specified at the operator > > > > level, > > > > but it some other way. > > > > In "parboiled for Java" operators and rules are modeled as method > > > > calls > > > > whereas parse tree building is controlled via annotations on those > > > > methods. You can selectively enable or disable the creation of > > > > parse > > > > tree nodes per grammar rule, which allows you to tweak parse tree > > > > building exactly to your needs. > > > > > > > > The interface between the parsing engine and custom parser action > > > > code > > > > is the thing that has changed the most over the course of > > > > parboileds > > > > life cycle so far. > > > > Initially parser action code had to access the parse tree nodes in > > > > order > > > > to get to matched input. Creating a custom object structure during > > > > the > > > > parsing run (e.g. an AST) was done by decorating the parse tree. > > > > So the > > > > parse tree was the "work bench" of the parsing process. > > > > Over time this heavy centering around the parse tree turned out to > > > > have > > > > two problems: > > > > > > > > 1. Low Performance. > > > > Always having to create a parse tree even when all you care about > > > > is the > > > > AST is just wasteful. Especially since the parse tree can contain > > > > a huge > > > > number of nodes for larger inputs (sometimes more nodes than input > > > > characters). > > > > > > > > 2. Less room for automatic optimizations. > > > > The structure of the parse tree is dictated by the structure of the > > > > grammar rules. If your parser action code is built under the > > > > assumption > > > > that the parse tree has a given structure there is little leeway > > > > for the > > > > parsing engine to apply automatic rule optimizations. Decoupling > > > > the > > > > parser action code from the parse tree opens up the possibility to > > > > apply > > > > all kinds of automatic grammar tweaking before running the parser > > > > the > > > > first time. The engine might decide to completely change the rule > > > > structure, as long as all changes do not change the recognized > > > > language > > > > and are transparent to the parser action code. > > > > > > > > Currently parboiled implements the interface between the parsing > > > > engine > > > > and the action code in the following way: > > > > 1. Actions can appear anywhere in a rule. > > > > 2. Actions can access the matched input text of the sub rule > > > > immediately > > > > preceding the action but not of any other rule (so there is no sub > > > > rule > > > > labeling required). > > > > 3. For working with custom objects (e.g. AST nodes) the engine > > > > provides > > > > a "Value Stack", which is a simple stack structure that serves as > > > > a fast > > > > work bench for a parsing run. Actions can push objects onto this > > > > stack, > > > > pop them off, swap them around, and so on. > > > > > > > > This solution completely decouples the parse tree from everything > > > > else. > > > > You can enable or disable parse tree building without any effect > > > > on the > > > > rest of the parser. There is no need for addressing sub rules in > > > > action > > > > expressions and given a somewhat efficient value stack > > > > implementation > > > > the whole thing is quite fast. Additionally, in "parboiled for > > > > Scala", > > > > the action code with its manipulations of the value stack can be > > > > statically type-checked at compile time, which is a huge plus. > > > > > > > > In case you are interested in more details or broader > > > > explanations, the > > > > parboiled documentation is quite complete. > > > > > > > > Cheers, > > > > Mathias > > > > > > > > --- > > > > [2]math...@parboiled.org > > > > [3]http://www.parboiled.org > > > > On 09.12.2010, at 21:01, Alan Post wrote: > > > > > > > > > I'm working on my PEG parser, in particular the interface between > > > > > the parse tree and the code one can attach to productions that > > > > > are executed on a successful parse. > > > > > > > > > > I've arranged for the two predicate operations, & and !, to not > > > > add > > > > > any output to the parse tree. �That means that the following > > > > > production: > > > > > > > > > > �rule <- &a !b "c" > > > > > > > > > > Produces the same parse tree as: > > > > > > > > > > �rule <- "c" > > > > > > > > > > Internally, this means that I recognize that the sequence > > > > operator > > > > > (which contains the productions '&a', '!b', and '"c"' in this > > > > > example) is being called with predicates in every position but > > > > one, > > > > > and rather than returning a list containing that single element, > > > > > I return just the single element. > > > > > > > > > > As I've been doing this, I've found that I want a new operator > > > > similar > > > > > to '&'. �'&' matches the production it is attached to, but it > > > > does not > > > > > advance the position of the input buffer. > > > > > > > > > > I'd like an operator that matches the production it is attached > > > > to, > > > > > advances the input buffer, but doesn't add anything to the parse > > > > > tree. > > > > > > > > > > Here's an example: > > > > > > > > > > �mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))} > > > > > > > > > > the mulexp production is a sequence of four other rules, but only > > > > > two of them are needed by the associated code. �It would be nice > > > > > if I could write the code rule like it is above, rather than say > > > > > this: > > > > > > > > > > �(lambda (x op y EOF) (* x y)) > > > > > > > > > > Having to account for all the rules in the sequence, but really > > > > > only caring about two of them. �Here is the example rewritten > > > > > with '^' expressing "match the rule, advance the input, but don't > > > > > modify the parse tree": > > > > > > > > > > �mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))} > > > > > > > > > > Before I go inventing syntax for this use case, will you tell me > > > > if > > > > > this is already being done with other parsers? �Have any of you > > > > had > > > > > this problem and already solved it, and if so, what approach did > > > > you > > > > > take? > > > > > > > > > > -Alan > > > > > -- > > > > > .i ko djuno fi le do sevzi > > > > > > > > > > _______________________________________________ > > > > > PEG mailing list > > > > > [4]...@lists.csail.mit.edu > > > > > [5]https://lists.csail.mit.edu/mailman/listinfo/peg > > > > > > > > _______________________________________________ > > > > PEG mailing list > > > > [6]...@lists.csail.mit.edu > > > > [7]https://lists.csail.mit.edu/mailman/listinfo/peg > > > > > > > > References > > > > > > > > Visible links > > > > 1. mailto:math...@parboiled.org > > > > 2. mailto:math...@parboiled.org > > > > 3. http://www.parboiled.org/ > > > > 4. mailto:PEG@lists.csail.mit.edu > > > > 5. https://lists.csail.mit.edu/mailman/listinfo/peg > > > > 6. mailto:PEG@lists.csail.mit.edu > > > > 7. https://lists.csail.mit.edu/mailman/listinfo/peg > > > > > > > _______________________________________________ > > > > PEG mailing list > > > > PEG@lists.csail.mit.edu > > > > https://lists.csail.mit.edu/mailman/listinfo/peg > > > > > > > > > -- > > > > > > Plasma conduit breach > > > > > > _______________________________________________ > > > PEG mailing list > > > PEG@lists.csail.mit.edu > > > https://lists.csail.mit.edu/mailman/listinfo/peg > > > > -- > > .i ko djuno fi le do sevzi > > -- > > Internet exceeded Luser level, please wait until a luser logs off before > attempting to log back on. > > _______________________________________________ > PEG mailing list > PEG@lists.csail.mit.edu > https://lists.csail.mit.edu/mailman/listinfo/peg -- .i ko djuno fi le do sevzi _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg