Terence, clearly I am not in a position to really argue against someone of your position and experience. I might have to soften my stance against packratting effectiveness after taking a closer look at the Rats! C grammar. Thanks for pointing me in the right direction...
Cheers, Mathias --- math...@parboiled.org http://www.parboiled.org On 20.10.2010, at 19:28, Terence Parr wrote: > > On Oct 20, 2010, at 9:38 AM, Mathias wrote: >> However, my hypothesis is that there is no "real-world" grammar, i.e. a >> grammar for some language actually used for business applications, that >> actually benefits from a packrat parsing approach. > > hiya. A C grammar with GCC extensions is real world isn't it? I'm running > 37,000 lines of code through it. This is the Rats! C grammar that I manually > converted to ANTLR format to see if ANTLR could generate a valid parser for > it without manipulating the structure of the grammar (it can). With > memoization, it finishes in seconds. Without, it doesn't terminate. It's > still running as I write this e-mail minutes later. The only difference is > the option list in the grammar: > > grammar RatsC; > //options {backtrack=true; memoize=true; output=AST;} > options {backtrack=true; output=AST;} // doesn't terminate > > Grimm does some amazing static grammar analysis optimizations in Rats! as > well as the packratting at runtime. His parsers are very efficient as a > result. I'm assuming that the original Rats! C parser would not terminate > without packratting though just like my version. > >> Given a somewhat efficient parsing engine (the Scala Parser Combinators for >> example are NOT a good example for this) the constant factors introduced by >> the overhead of memoization kill any benefit from the reduction of rule >> matches. > > You are correct that memoization slows down the parse so you have to be > careful about how much use it. ANTLR lets you specify this on a decision by > decision basis and also removes backtracking whenever possible to avoid the > whole issue. > >> I have some data to back this up: >> The parboiled PEG parsing library comes with a "ProfilingParseRunner" that, >> among other things, counts rule matches and mismatches during the runtime of >> the parser against a given input. It does this globally, for each individual >> rule as well as for "rematches" and "remismatches", i.e. cases where the >> rule application could have been "saved" by a packrat parser. > > Well, yes, any grammar can be made more efficient. In fact, I'm finding in my > empirical studies for paper that some commercial grammars a friend lent me > work very well without memoization even though they backtrack. He worked very > hard to make them that way. > > One or two examples that you can make very efficient isn't super compelling I > don't think. Building a grammar without concern for exponential landmines in > nested backtracking is what I'm concerned with. It's easier and faster to > build a natural grammar without having to worry about avoiding exponential > behavior. > > Just passing on my experience...been working with only real-world grammars > (I'm not much of an academic) that backtrack for almost 20 years since ANTLR > introduced syntactic predicates. You need memoization to make your job > easier. > >> When run against parboileds Java grammar example and some larger Java input >> the profiler shows that only 10% of all rule applications could have been >> saved by a packrat parser. > > I can probably build you a natural Java grammar that needs packratting. Of > course, I would go optimize it later. > >> As another example one might take a look at the Markdown parser grammar also >> available for parboiled. This grammar is not optimized at all for speed and >> depends a lot on PEG syntactic predicates, which are probably one of the >> main sources for rule reapplications. > > It all depends on the complexity of the language and the particular grammar > implementation. There are infinite number of grammars for any single > language. C is particularly nasty because everything looks the same from the > left edge. > >> Do you have any guess (or even data) on how many rule applications can be >> saved by memoization in the case of the PEG C grammar you recently looked >> at? What was the speedup of enabling memoization you experienced? > > Sure. As I mentioned above, it means the difference between finishing in > seconds versus waiting many minutes and then having to kill it. I don't need > to count to recognize that exponential landmine ;) > > Ter > _______________________________________________ > PEG mailing list > PEG@lists.csail.mit.edu > https://lists.csail.mit.edu/mailman/listinfo/peg _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg