Mathias, This e-mail helped me immensely in understanding the design constraints of parboiled and to see much better how the program works, thank you.
I again really appreciate your design discussion, particularly in talking about what you had to change the most and how the design evolved to acknowledge that. I would not be as far along with my work without the benefit of your experience. -Alan On Sat, Dec 11, 2010 at 02:18:10PM +0100, Mathias wrote: > Alan, > sorry for not getting back to you ealier! > > > When I look at this code: > > > > https://github.com/sirthias/pegdown/blob/master/src/org/pegdown/Parser.java > > > > Am I looking at something hand-written, or am I looking at something > > compiled from a grammar definition? > > > Grammars in parboiled are written in internal DSLs. The markdown parser you > are referring to is an example of a complete (grammar and actions) "parboiled > for Java" parser. It's "hand-written" if you will, even though most people > would use an Java IDE for getting it done, which helps a lot in managing > Javas verbosity. > > However, since parboiled implements a "standard" recursive descent PEG parser > it's very easy to transcribe grammars defined using the "practical syntax for > PEGs" proposed by Bryan Ford in his 2004 paper (or any other custom format) > to parboileds internal DSL representation. > In fact the parboiled examples used to include a small tool > (https://github.com/sirthias/parboiled/blob/v0.9.6.0/examples/org/parboiled/examples/pegtranslator/PegTranslator.java) > which automatically did this transcription in order to provide a starting > point for further developments. > > Note that "parboiled for Scala" parsers look much cleaner, as Scala lends > itself much better for modeling internal DSLs. > This JSON parser example: > https://github.com/sirthias/parboiled/blob/master/src/examples/scala/org/parboiled/examples/json/JsonParser1.scala > (again complete with actions) is much more concise than an equivalent Java > implementation and as such much better to read. > > > My specific need is to automatically generate a parser > > from a PEG description, which I have assumed means I need to attach > > actions to the grammar, but that I should do so in a way that > > doesn't tightly couple the action code to the parse tree. Is this > > something that parboiled does? > > > One thing I tried to say in my previous message was that action code does not > necessarily have to be parse-tree based. There are ways to connect your > actions to the parsing engine without the need to build a full-blown parse > tree during a parsing run. In fact I think this is what most parser > generators (like ANTLR) do. Parse tree building should be optional and not a > prerequisite for being able to run action code. > > > And if so, may I see an example of > > the grammar that is used to compile a parser? > > parboiled does not as such "compile a parser". It currently "interprets" your > grammar rules against an input at run-time, even though the parsing engine is > quite efficient. The next major release however will contain a real "parser > generator". parboiled will then be able to compile your rules directly to JVM > byte code (again at run-time, but only once per JVM life cycle), which should > yield another nice performance increase. > > Cheers, > Mathias > > --- > math...@parboiled.org > http://www.parboiled.org > > On 09.12.2010, at 23:58, Alan Post wrote: > > > Will you help me understand something about parboiled? > > > > When I look at this code: > > > > https://github.com/sirthias/pegdown/blob/master/src/org/pegdown/Parser.java > > > > Am I looking at something hand-written, or am I looking at something > > compiled from a grammar definition? > > > > I've been influenced by your assertion that I should separate code > > and grammar. My specific need is to automatically generate a parser > > from a PEG description, which I have assumed means I need to attach > > actions to the grammar, but that I should do so in a way that > > doesn't tightly couple the action code to the parse tree. Is this > > something that parboiled does? And if so, may I see an example of > > the grammar that is used to compile a parser? > > > > Thank you, > > > > -Alan > > > > On Thu, Dec 09, 2010 at 11:44:44PM +0100, Mathias wrote: > >> Over the course of developing parboiled (PEG parsing engine for Java and > >> Scala using internals DSLs for grammar specification) I found that it's > >> best to clearly separate the following three things: > >> - grammar operators > >> - parse tree building > >> - parser action code > >> > >> The operators, together with the rules they are applied to, define the > >> language your grammar recognizes. Nothing more, nothing less. > >> Whether your engine builds a parse tree for input of that language or > >> whether it doesn't should likely not be specified at the operator level, > >> but it some other way. > >> In "parboiled for Java" operators and rules are modeled as method calls > >> whereas parse tree building is controlled via annotations on those > >> methods. You can selectively enable or disable the creation of parse tree > >> nodes per grammar rule, which allows you to tweak parse tree building > >> exactly to your needs. > >> > >> The interface between the parsing engine and custom parser action code is > >> the thing that has changed the most over the course of parboileds life > >> cycle so far. > >> Initially parser action code had to access the parse tree nodes in order > >> to get to matched input. Creating a custom object structure during the > >> parsing run (e.g. an AST) was done by decorating the parse tree. So the > >> parse tree was the "work bench" of the parsing process. > >> Over time this heavy centering around the parse tree turned out to have > >> two problems: > >> > >> 1. Low Performance. > >> Always having to create a parse tree even when all you care about is the > >> AST is just wasteful. Especially since the parse tree can contain a huge > >> number of nodes for larger inputs (sometimes more nodes than input > >> characters). > >> > >> 2. Less room for automatic optimizations. > >> The structure of the parse tree is dictated by the structure of the > >> grammar rules. If your parser action code is built under the assumption > >> that the parse tree has a given structure there is little leeway for the > >> parsing engine to apply automatic rule optimizations. Decoupling the > >> parser action code from the parse tree opens up the possibility to apply > >> all kinds of automatic grammar tweaking before running the parser the > >> first time. The engine might decide to completely change the rule > >> structure, as long as all changes do not change the recognized language > >> and are transparent to the parser action code. > >> > >> Currently parboiled implements the interface between the parsing engine > >> and the action code in the following way: > >> 1. Actions can appear anywhere in a rule. > >> 2. Actions can access the matched input text of the sub rule immediately > >> preceding the action but not of any other rule (so there is no sub rule > >> labeling required). > >> 3. For working with custom objects (e.g. AST nodes) the engine provides a > >> "Value Stack", which is a simple stack structure that serves as a fast > >> work bench for a parsing run. Actions can push objects onto this stack, > >> pop them off, swap them around, and so on. > >> > >> This solution completely decouples the parse tree from everything else. > >> You can enable or disable parse tree building without any effect on the > >> rest of the parser. There is no need for addressing sub rules in action > >> expressions and given a somewhat efficient value stack implementation the > >> whole thing is quite fast. Additionally, in "parboiled for Scala", the > >> action code with its manipulations of the value stack can be statically > >> type-checked at compile time, which is a huge plus. > >> > >> In case you are interested in more details or broader explanations, the > >> parboiled documentation is quite complete. > >> > >> Cheers, > >> Mathias > >> > >> --- > >> math...@parboiled.org > >> http://www.parboiled.org > >> > >> On 09.12.2010, at 21:01, Alan Post wrote: > >> > >>> I'm working on my PEG parser, in particular the interface between > >>> the parse tree and the code one can attach to productions that > >>> are executed on a successful parse. > >>> > >>> I've arranged for the two predicate operations, & and !, to not add > >>> any output to the parse tree. That means that the following > >>> production: > >>> > >>> rule <- &a !b "c" > >>> > >>> Produces the same parse tree as: > >>> > >>> rule <- "c" > >>> > >>> Internally, this means that I recognize that the sequence operator > >>> (which contains the productions '&a', '!b', and '"c"' in this > >>> example) is being called with predicates in every position but one, > >>> and rather than returning a list containing that single element, > >>> I return just the single element. > >>> > >>> As I've been doing this, I've found that I want a new operator similar > >>> to '&'. '&' matches the production it is attached to, but it does not > >>> advance the position of the input buffer. > >>> > >>> I'd like an operator that matches the production it is attached to, > >>> advances the input buffer, but doesn't add anything to the parse > >>> tree. > >>> > >>> Here's an example: > >>> > >>> mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))} > >>> > >>> the mulexp production is a sequence of four other rules, but only > >>> two of them are needed by the associated code. It would be nice > >>> if I could write the code rule like it is above, rather than say > >>> this: > >>> > >>> (lambda (x op y EOF) (* x y)) > >>> > >>> Having to account for all the rules in the sequence, but really > >>> only caring about two of them. Here is the example rewritten > >>> with '^' expressing "match the rule, advance the input, but don't > >>> modify the parse tree": > >>> > >>> mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))} > >>> > >>> Before I go inventing syntax for this use case, will you tell me if > >>> this is already being done with other parsers? Have any of you had > >>> this problem and already solved it, and if so, what approach did you > >>> take? > >>> > >>> -Alan > >>> -- > >>> .i ko djuno fi le do sevzi > >>> > >>> _______________________________________________ > >>> PEG mailing list > >>> PEG@lists.csail.mit.edu > >>> https://lists.csail.mit.edu/mailman/listinfo/peg > >> > > > > -- > > .i ko djuno fi le do sevzi > -- .i ko djuno fi le do sevzi _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg