Re: [PEG] and-predicate with slightly different behavior?

Alan Post Sat, 11 Dec 2010 09:18:45 -0800

Mathias,

This e-mail helped me immensely in understanding the design constraints
of parboiled and to see much better how the program works, thank
you.


I again really appreciate your design discussion, particularly in
talking about what you had to change the most and how the design
evolved to acknowledge that.  I would not be as far along with my
work without the benefit of your experience.

-Alan

On Sat, Dec 11, 2010 at 02:18:10PM +0100, Mathias wrote:
> Alan,
> sorry for not getting back to you ealier!
> 
> > When I look at this code:
> > 
> >  https://github.com/sirthias/pegdown/blob/master/src/org/pegdown/Parser.java
> > 
> > Am I looking at something hand-written, or am I looking at something
> > compiled from a grammar definition?
> 
> 
> Grammars in parboiled are written in internal DSLs. The markdown parser you 
> are referring to is an example of a complete (grammar and actions) "parboiled 
> for Java" parser. It's "hand-written" if you will, even though most people 
> would use an Java IDE for getting it done, which helps a lot in managing 
> Javas verbosity.
> 
> However, since parboiled implements a "standard" recursive descent PEG parser 
> it's very easy to transcribe grammars defined using the "practical syntax for 
> PEGs" proposed by Bryan Ford in his 2004 paper (or any other custom format) 
> to parboileds internal DSL representation.
> In fact the parboiled examples used to include a small tool 
> (https://github.com/sirthias/parboiled/blob/v0.9.6.0/examples/org/parboiled/examples/pegtranslator/PegTranslator.java)
>  which automatically did this transcription in order to provide a starting 
> point for further developments.
> 
> Note that "parboiled for Scala" parsers look much cleaner, as Scala lends 
> itself much better for modeling internal DSLs.
> This JSON parser example: 
> https://github.com/sirthias/parboiled/blob/master/src/examples/scala/org/parboiled/examples/json/JsonParser1.scala
> (again complete with actions) is much more concise than an equivalent Java 
> implementation and as such much better to read.
> 
> > My specific need is to automatically generate a parser
> > from a PEG description, which I have assumed means I need to attach
> > actions to the grammar, but that I should do so in a way that
> > doesn't tightly couple the action code to the parse tree.  Is this
> > something that parboiled does?
> 
> 
> One thing I tried to say in my previous message was that action code does not 
> necessarily have to be parse-tree based. There are ways to connect your 
> actions to the parsing engine without the need to build a full-blown parse 
> tree during a parsing run. In fact I think this is what most parser 
> generators (like ANTLR) do. Parse tree building should be optional and not a 
> prerequisite for being able to run action code.
> 
> > And if so, may I see an example of
> > the grammar that is used to compile a parser?
> 
> parboiled does not as such "compile a parser". It currently "interprets" your 
> grammar rules against an input at run-time, even though the parsing engine is 
> quite efficient. The next major release however will contain a real "parser 
> generator". parboiled will then be able to compile your rules directly to JVM 
> byte code (again at run-time, but only once per JVM life cycle), which should 
> yield another nice performance increase.
> 
> Cheers,
> Mathias
> 
> ---
> math...@parboiled.org
> http://www.parboiled.org
> 
> On 09.12.2010, at 23:58, Alan Post wrote:
> 
> > Will you help me understand something about parboiled?
> > 
> > When I look at this code:
> > 
> >  https://github.com/sirthias/pegdown/blob/master/src/org/pegdown/Parser.java
> > 
> > Am I looking at something hand-written, or am I looking at something
> > compiled from a grammar definition?
> > 
> > I've been influenced by your assertion that I should separate code
> > and grammar.  My specific need is to automatically generate a parser
> > from a PEG description, which I have assumed means I need to attach
> > actions to the grammar, but that I should do so in a way that
> > doesn't tightly couple the action code to the parse tree.  Is this
> > something that parboiled does?  And if so, may I see an example of
> > the grammar that is used to compile a parser?
> > 
> > Thank you,
> > 
> > -Alan
> > 
> > On Thu, Dec 09, 2010 at 11:44:44PM +0100, Mathias wrote:
> >> Over the course of developing parboiled (PEG parsing engine for Java and 
> >> Scala using internals DSLs for grammar specification) I found that it's 
> >> best to clearly separate the following three things:
> >> - grammar operators
> >> - parse tree building
> >> - parser action code
> >> 
> >> The operators, together with the rules they are applied to, define the 
> >> language your grammar recognizes. Nothing more, nothing less.
> >> Whether your engine builds a parse tree for input of that language or 
> >> whether it doesn't should likely not be specified at the operator level, 
> >> but it some other way.
> >> In "parboiled for Java" operators and rules are modeled as method calls 
> >> whereas parse tree building is controlled via annotations on those 
> >> methods. You can selectively enable or disable the creation of parse tree 
> >> nodes per grammar rule, which allows you to tweak parse tree building 
> >> exactly to your needs.
> >> 
> >> The interface between the parsing engine and custom parser action code is 
> >> the thing that has changed the most over the course of parboileds life 
> >> cycle so far.
> >> Initially parser action code had to access the parse tree nodes in order 
> >> to get to matched input. Creating a custom object structure during the 
> >> parsing run (e.g. an AST) was done by decorating the parse tree. So the 
> >> parse tree was the "work bench" of the parsing process.
> >> Over time this heavy centering around the parse tree turned out to have 
> >> two problems:
> >> 
> >> 1. Low Performance.
> >> Always having to create a parse tree even when all you care about is the 
> >> AST is just wasteful. Especially since the parse tree can contain a huge 
> >> number of nodes for larger inputs (sometimes more nodes than input 
> >> characters).
> >> 
> >> 2. Less room for automatic optimizations.
> >> The structure of the parse tree is dictated by the structure of the 
> >> grammar rules. If your parser action code is built under the assumption 
> >> that the parse tree has a given structure there is little leeway for the 
> >> parsing engine to apply automatic rule optimizations. Decoupling the 
> >> parser action code from the parse tree opens up the possibility to apply 
> >> all kinds of automatic grammar tweaking before running the parser the 
> >> first time. The engine might decide to completely change the rule 
> >> structure, as long as all changes do not change the recognized language 
> >> and are transparent to the parser action code.
> >> 
> >> Currently parboiled implements the interface between the parsing engine 
> >> and the action code in the following way:
> >> 1. Actions can appear anywhere in a rule.
> >> 2. Actions can access the matched input text of the sub rule immediately 
> >> preceding the action but not of any other rule (so there is no sub rule 
> >> labeling required).
> >> 3. For working with custom objects (e.g. AST nodes) the engine provides a 
> >> "Value Stack", which is a simple stack structure that serves as a fast 
> >> work bench for a parsing run. Actions can push objects onto this stack, 
> >> pop them off, swap them around, and so on.
> >> 
> >> This solution completely decouples the parse tree from everything else. 
> >> You can enable or disable parse tree building without any effect on the 
> >> rest of the parser. There is no need for addressing sub rules in action 
> >> expressions and given a somewhat efficient value stack implementation the 
> >> whole thing is quite fast. Additionally, in "parboiled for Scala", the 
> >> action code with its manipulations of the value stack can be statically 
> >> type-checked at compile time, which is a huge plus.
> >> 
> >> In case you are interested in more details or broader explanations, the 
> >> parboiled documentation is quite complete.
> >> 
> >> Cheers,
> >> Mathias
> >> 
> >> ---
> >> math...@parboiled.org
> >> http://www.parboiled.org
> >> 
> >> On 09.12.2010, at 21:01, Alan Post wrote:
> >> 
> >>> I'm working on my PEG parser, in particular the interface between
> >>> the parse tree and the code one can attach to productions that
> >>> are executed on a successful parse.
> >>> 
> >>> I've arranged for the two predicate operations, & and !, to not add
> >>> any output to the parse tree.  That means that the following
> >>> production:
> >>> 
> >>> rule <- &a !b "c"
> >>> 
> >>> Produces the same parse tree as:
> >>> 
> >>> rule <- "c"
> >>> 
> >>> Internally, this means that I recognize that the sequence operator
> >>> (which contains the productions '&a', '!b', and '"c"' in this
> >>> example) is being called with predicates in every position but one,
> >>> and rather than returning a list containing that single element,
> >>> I return just the single element.
> >>> 
> >>> As I've been doing this, I've found that I want a new operator similar
> >>> to '&'.  '&' matches the production it is attached to, but it does not
> >>> advance the position of the input buffer.
> >>> 
> >>> I'd like an operator that matches the production it is attached to,
> >>> advances the input buffer, but doesn't add anything to the parse
> >>> tree.
> >>> 
> >>> Here's an example:
> >>> 
> >>> mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
> >>> 
> >>> the mulexp production is a sequence of four other rules, but only
> >>> two of them are needed by the associated code.  It would be nice
> >>> if I could write the code rule like it is above, rather than say
> >>> this:
> >>> 
> >>> (lambda (x op y EOF) (* x y))
> >>> 
> >>> Having to account for all the rules in the sequence, but really 
> >>> only caring about two of them.  Here is the example rewritten
> >>> with '^' expressing "match the rule, advance the input, but don't
> >>> modify the parse tree":
> >>> 
> >>> mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))}
> >>> 
> >>> Before I go inventing syntax for this use case, will you tell me if
> >>> this is already being done with other parsers?  Have any of you had
> >>> this problem and already solved it, and if so, what approach did you
> >>> take?
> >>> 
> >>> -Alan
> >>> -- 
> >>> .i ko djuno fi le do sevzi
> >>> 
> >>> _______________________________________________
> >>> PEG mailing list
> >>> PEG@lists.csail.mit.edu
> >>> https://lists.csail.mit.edu/mailman/listinfo/peg
> >> 
> > 
> > -- 
> > .i ko djuno fi le do sevzi
> 

-- 
.i ko djuno fi le do sevzi

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] and-predicate with slightly different behavior?

Reply via email to