Re: [PEG] and-predicate with slightly different behavior?

Mathias Sat, 11 Dec 2010 05:18:41 -0800

Alan,
sorry for not getting back to you ealier!

> When I look at this code:
> 
>  https://github.com/sirthias/pegdown/blob/master/src/org/pegdown/Parser.java
> 
> Am I looking at something hand-written, or am I looking at something
> compiled from a grammar definition?



Grammars in parboiled are written in internal DSLs. The markdown parser you are 
referring to is an example of a complete (grammar and actions) "parboiled for 
Java" parser. It's "hand-written" if you will, even though most people would 
use an Java IDE for getting it done, which helps a lot in managing Javas 
verbosity.

However, since parboiled implements a "standard" recursive descent PEG parser 
it's very easy to transcribe grammars defined using the "practical syntax for 
PEGs" proposed by Bryan Ford in his 2004 paper (or any other custom format) to 
parboileds internal DSL representation.
In fact the parboiled examples used to include a small tool 
(https://github.com/sirthias/parboiled/blob/v0.9.6.0/examples/org/parboiled/examples/pegtranslator/PegTranslator.java)
 which automatically did this transcription in order to provide a starting 
point for further developments.

Note that "parboiled for Scala" parsers look much cleaner, as Scala lends 
itself much better for modeling internal DSLs.
This JSON parser example: 
https://github.com/sirthias/parboiled/blob/master/src/examples/scala/org/parboiled/examples/json/JsonParser1.scala
(again complete with actions) is much more concise than an equivalent Java 
implementation and as such much better to read.

> My specific need is to automatically generate a parser
> from a PEG description, which I have assumed means I need to attach
> actions to the grammar, but that I should do so in a way that
> doesn't tightly couple the action code to the parse tree.  Is this
> something that parboiled does?


One thing I tried to say in my previous message was that action code does not 
necessarily have to be parse-tree based. There are ways to connect your actions 
to the parsing engine without the need to build a full-blown parse tree during 
a parsing run. In fact I think this is what most parser generators (like ANTLR) 
do. Parse tree building should be optional and not a prerequisite for being 
able to run action code.

> And if so, may I see an example of
> the grammar that is used to compile a parser?

parboiled does not as such "compile a parser". It currently "interprets" your 
grammar rules against an input at run-time, even though the parsing engine is 
quite efficient. The next major release however will contain a real "parser 
generator". parboiled will then be able to compile your rules directly to JVM 
byte code (again at run-time, but only once per JVM life cycle), which should 
yield another nice performance increase.

Cheers,
Mathias

---
math...@parboiled.org
http://www.parboiled.org

On 09.12.2010, at 23:58, Alan Post wrote:

> Will you help me understand something about parboiled?
> 
> When I look at this code:
> 
>  https://github.com/sirthias/pegdown/blob/master/src/org/pegdown/Parser.java
> 
> Am I looking at something hand-written, or am I looking at something
> compiled from a grammar definition?
> 
> I've been influenced by your assertion that I should separate code
> and grammar.  My specific need is to automatically generate a parser
> from a PEG description, which I have assumed means I need to attach
> actions to the grammar, but that I should do so in a way that
> doesn't tightly couple the action code to the parse tree.  Is this
> something that parboiled does?  And if so, may I see an example of
> the grammar that is used to compile a parser?
> 
> Thank you,
> 
> -Alan
> 
> On Thu, Dec 09, 2010 at 11:44:44PM +0100, Mathias wrote:
>> Over the course of developing parboiled (PEG parsing engine for Java and 
>> Scala using internals DSLs for grammar specification) I found that it's best 
>> to clearly separate the following three things:
>> - grammar operators
>> - parse tree building
>> - parser action code
>> 
>> The operators, together with the rules they are applied to, define the 
>> language your grammar recognizes. Nothing more, nothing less.
>> Whether your engine builds a parse tree for input of that language or 
>> whether it doesn't should likely not be specified at the operator level, but 
>> it some other way.
>> In "parboiled for Java" operators and rules are modeled as method calls 
>> whereas parse tree building is controlled via annotations on those methods. 
>> You can selectively enable or disable the creation of parse tree nodes per 
>> grammar rule, which allows you to tweak parse tree building exactly to your 
>> needs.
>> 
>> The interface between the parsing engine and custom parser action code is 
>> the thing that has changed the most over the course of parboileds life cycle 
>> so far.
>> Initially parser action code had to access the parse tree nodes in order to 
>> get to matched input. Creating a custom object structure during the parsing 
>> run (e.g. an AST) was done by decorating the parse tree. So the parse tree 
>> was the "work bench" of the parsing process.
>> Over time this heavy centering around the parse tree turned out to have two 
>> problems:
>> 
>> 1. Low Performance.
>> Always having to create a parse tree even when all you care about is the AST 
>> is just wasteful. Especially since the parse tree can contain a huge number 
>> of nodes for larger inputs (sometimes more nodes than input characters).
>> 
>> 2. Less room for automatic optimizations.
>> The structure of the parse tree is dictated by the structure of the grammar 
>> rules. If your parser action code is built under the assumption that the 
>> parse tree has a given structure there is little leeway for the parsing 
>> engine to apply automatic rule optimizations. Decoupling the parser action 
>> code from the parse tree opens up the possibility to apply all kinds of 
>> automatic grammar tweaking before running the parser the first time. The 
>> engine might decide to completely change the rule structure, as long as all 
>> changes do not change the recognized language and are transparent to the 
>> parser action code.
>> 
>> Currently parboiled implements the interface between the parsing engine and 
>> the action code in the following way:
>> 1. Actions can appear anywhere in a rule.
>> 2. Actions can access the matched input text of the sub rule immediately 
>> preceding the action but not of any other rule (so there is no sub rule 
>> labeling required).
>> 3. For working with custom objects (e.g. AST nodes) the engine provides a 
>> "Value Stack", which is a simple stack structure that serves as a fast work 
>> bench for a parsing run. Actions can push objects onto this stack, pop them 
>> off, swap them around, and so on.
>> 
>> This solution completely decouples the parse tree from everything else. You 
>> can enable or disable parse tree building without any effect on the rest of 
>> the parser. There is no need for addressing sub rules in action expressions 
>> and given a somewhat efficient value stack implementation the whole thing is 
>> quite fast. Additionally, in "parboiled for Scala", the action code with its 
>> manipulations of the value stack can be statically type-checked at compile 
>> time, which is a huge plus.
>> 
>> In case you are interested in more details or broader explanations, the 
>> parboiled documentation is quite complete.
>> 
>> Cheers,
>> Mathias
>> 
>> ---
>> math...@parboiled.org
>> http://www.parboiled.org
>> 
>> On 09.12.2010, at 21:01, Alan Post wrote:
>> 
>>> I'm working on my PEG parser, in particular the interface between
>>> the parse tree and the code one can attach to productions that
>>> are executed on a successful parse.
>>> 
>>> I've arranged for the two predicate operations, & and !, to not add
>>> any output to the parse tree.  That means that the following
>>> production:
>>> 
>>> rule <- &a !b "c"
>>> 
>>> Produces the same parse tree as:
>>> 
>>> rule <- "c"
>>> 
>>> Internally, this means that I recognize that the sequence operator
>>> (which contains the productions '&a', '!b', and '"c"' in this
>>> example) is being called with predicates in every position but one,
>>> and rather than returning a list containing that single element,
>>> I return just the single element.
>>> 
>>> As I've been doing this, I've found that I want a new operator similar
>>> to '&'.  '&' matches the production it is attached to, but it does not
>>> advance the position of the input buffer.
>>> 
>>> I'd like an operator that matches the production it is attached to,
>>> advances the input buffer, but doesn't add anything to the parse
>>> tree.
>>> 
>>> Here's an example:
>>> 
>>> mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
>>> 
>>> the mulexp production is a sequence of four other rules, but only
>>> two of them are needed by the associated code.  It would be nice
>>> if I could write the code rule like it is above, rather than say
>>> this:
>>> 
>>> (lambda (x op y EOF) (* x y))
>>> 
>>> Having to account for all the rules in the sequence, but really 
>>> only caring about two of them.  Here is the example rewritten
>>> with '^' expressing "match the rule, advance the input, but don't
>>> modify the parse tree":
>>> 
>>> mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))}
>>> 
>>> Before I go inventing syntax for this use case, will you tell me if
>>> this is already being done with other parsers?  Have any of you had
>>> this problem and already solved it, and if so, what approach did you
>>> take?
>>> 
>>> -Alan
>>> -- 
>>> .i ko djuno fi le do sevzi
>>> 
>>> _______________________________________________
>>> PEG mailing list
>>> PEG@lists.csail.mit.edu
>>> https://lists.csail.mit.edu/mailman/listinfo/peg
>> 
> 
> -- 
> .i ko djuno fi le do sevzi


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] and-predicate with slightly different behavior?

Reply via email to