Terence,

clearly I am not in a position to really argue against someone of your position 
and experience.
I might have to soften my stance against packratting effectiveness after taking 
a closer look at the Rats! C grammar.
Thanks for pointing me in the right direction...

Cheers,
Mathias

---
math...@parboiled.org
http://www.parboiled.org

On 20.10.2010, at 19:28, Terence Parr wrote:

> 
> On Oct 20, 2010, at 9:38 AM, Mathias wrote:
>> However, my hypothesis is that there is no "real-world" grammar, i.e. a 
>> grammar for some language actually used for business applications, that 
>> actually benefits from a packrat parsing approach.
> 
> hiya.  A C grammar with GCC extensions is real world isn't it? I'm running 
> 37,000 lines of code through it. This is the Rats! C grammar that I manually 
> converted to ANTLR format to see if ANTLR could generate a valid parser for 
> it without manipulating the structure of the grammar (it can).  With 
> memoization, it finishes in seconds. Without, it doesn't terminate. It's 
> still running as I write this e-mail minutes later. The only difference is 
> the option list in the grammar:
> 
> grammar RatsC;
> //options {backtrack=true; memoize=true; output=AST;}
> options {backtrack=true; output=AST;} // doesn't terminate
> 
> Grimm does some amazing static grammar analysis optimizations in Rats! as 
> well as the packratting at runtime. His parsers are very efficient as a 
> result. I'm assuming that the original Rats! C parser would not terminate 
> without packratting though just like my version.
> 
>> Given a somewhat efficient parsing engine (the Scala Parser Combinators for 
>> example are NOT a good example for this) the constant factors introduced by 
>> the overhead of memoization kill any benefit from the reduction of rule 
>> matches.
> 
> You are correct that memoization slows down the parse so you have to be 
> careful about how much use it.  ANTLR lets you specify this on a decision by 
> decision basis and also removes backtracking whenever possible to avoid the 
> whole issue.
> 
>> I have some data to back this up:
>> The parboiled PEG parsing library comes with a "ProfilingParseRunner" that, 
>> among other things, counts rule matches and mismatches during the runtime of 
>> the parser against a given input. It does this globally, for each individual 
>> rule as well as for "rematches" and "remismatches", i.e. cases where the 
>> rule application could have been "saved" by a packrat parser.
> 
> Well, yes, any grammar can be made more efficient. In fact, I'm finding in my 
> empirical studies for paper that some commercial grammars a friend lent me 
> work very well without memoization even though they backtrack. He worked very 
> hard to make them that way.
> 
> One or two examples that you can make very efficient isn't super compelling I 
> don't think. Building a grammar without concern for exponential landmines in 
> nested backtracking is what I'm concerned with.  It's easier and faster to 
> build a natural grammar without having to worry about avoiding exponential 
> behavior.
> 
> Just passing on my experience...been working with only real-world grammars 
> (I'm not much of an academic) that backtrack for almost 20 years since ANTLR 
> introduced syntactic predicates.  You need memoization to make your job 
> easier.
> 
>> When run against parboileds Java grammar example and some larger Java input 
>> the profiler shows that only 10% of all rule applications could have been 
>> saved by a packrat parser.
> 
> I can probably build you a natural Java grammar that needs packratting.  Of 
> course, I would go optimize it later.
> 
>> As another example one might take a look at the Markdown parser grammar also 
>> available for parboiled. This grammar is not optimized at all for speed and 
>> depends a lot on PEG syntactic predicates, which are probably one of the 
>> main sources for rule reapplications.
> 
> It all depends on the complexity of the language and the particular grammar 
> implementation. There are infinite number of grammars for any single 
> language. C is particularly nasty because everything looks the same from the 
> left edge.
> 
>> Do you have any guess (or even data) on how many rule applications can be 
>> saved by memoization in the case of the PEG C grammar you recently looked 
>> at? What was the speedup of enabling memoization you experienced?
> 
> Sure.  As I mentioned above, it means the difference between finishing in 
> seconds versus waiting many minutes and then having to kill it. I don't need 
> to count to recognize that exponential landmine ;)
> 
> Ter
> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to