Re: [PEG] backtracking and code execution

Terence Parr Wed, 20 Oct 2010 10:28:54 -0700

On Oct 20, 2010, at 9:38 AM, Mathias wrote:
> However, my hypothesis is that there is no "real-world" grammar, i.e. a 
> grammar for some language actually used for business applications, that 
> actually benefits from a packrat parsing approach.


hiya.  A C grammar with GCC extensions is real world isn't it? I'm running 
37,000 lines of code through it. This is the Rats! C grammar that I manually 
converted to ANTLR format to see if ANTLR could generate a valid parser for it 
without manipulating the structure of the grammar (it can).  With memoization, 
it finishes in seconds. Without, it doesn't terminate. It's still running as I 
write this e-mail minutes later. The only difference is the option list in the 
grammar:

grammar RatsC;
//options {backtrack=true; memoize=true; output=AST;}
options {backtrack=true; output=AST;} // doesn't terminate

Grimm does some amazing static grammar analysis optimizations in Rats! as well 
as the packratting at runtime. His parsers are very efficient as a result. I'm 
assuming that the original Rats! C parser would not terminate without 
packratting though just like my version.

> Given a somewhat efficient parsing engine (the Scala Parser Combinators for 
> example are NOT a good example for this) the constant factors introduced by 
> the overhead of memoization kill any benefit from the reduction of rule 
> matches.

You are correct that memoization slows down the parse so you have to be careful 
about how much use it.  ANTLR lets you specify this on a decision by decision 
basis and also removes backtracking whenever possible to avoid the whole issue.

> I have some data to back this up:
> The parboiled PEG parsing library comes with a "ProfilingParseRunner" that, 
> among other things, counts rule matches and mismatches during the runtime of 
> the parser against a given input. It does this globally, for each individual 
> rule as well as for "rematches" and "remismatches", i.e. cases where the rule 
> application could have been "saved" by a packrat parser.

Well, yes, any grammar can be made more efficient. In fact, I'm finding in my 
empirical studies for paper that some commercial grammars a friend lent me work 
very well without memoization even though they backtrack. He worked very hard 
to make them that way.

One or two examples that you can make very efficient isn't super compelling I 
don't think. Building a grammar without concern for exponential landmines in 
nested backtracking is what I'm concerned with.  It's easier and faster to 
build a natural grammar without having to worry about avoiding exponential 
behavior.

Just passing on my experience...been working with only real-world grammars (I'm 
not much of an academic) that backtrack for almost 20 years since ANTLR 
introduced syntactic predicates.  You need memoization to make your job easier.

> When run against parboileds Java grammar example and some larger Java input 
> the profiler shows that only 10% of all rule applications could have been 
> saved by a packrat parser.

I can probably build you a natural Java grammar that needs packratting.  Of 
course, I would go optimize it later.

> As another example one might take a look at the Markdown parser grammar also 
> available for parboiled. This grammar is not optimized at all for speed and 
> depends a lot on PEG syntactic predicates, which are probably one of the main 
> sources for rule reapplications.

It all depends on the complexity of the language and the particular grammar 
implementation. There are infinite number of grammars for any single language. 
C is particularly nasty because everything looks the same from the left edge.

> Do you have any guess (or even data) on how many rule applications can be 
> saved by memoization in the case of the PEG C grammar you recently looked at? 
> What was the speedup of enabling memoization you experienced?

Sure.  As I mentioned above, it means the difference between finishing in 
seconds versus waiting many minutes and then having to kill it. I don't need to 
count to recognize that exponential landmine ;)

Ter
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] backtracking and code execution

Reply via email to