Terence, thanks for your comment on this. I understand the difference in worst-case complexity between a non-memoizing PEG parser and one performing memoization. Of course one can construct example grammars exhibiting an excessive amount of backtracking. However, my hypothesis is that there is no "real-world" grammar, i.e. a grammar for some language actually used for business applications, that actually benefits from a packrat parsing approach. Given a somewhat efficient parsing engine (the Scala Parser Combinators for example are NOT a good example for this) the constant factors introduced by the overhead of memoization kill any benefit from the reduction of rule matches.
I have some data to back this up: The parboiled PEG parsing library comes with a "ProfilingParseRunner" that, among other things, counts rule matches and mismatches during the runtime of the parser against a given input. It does this globally, for each individual rule as well as for "rematches" and "remismatches", i.e. cases where the rule application could have been "saved" by a packrat parser. When run against parboileds Java grammar example and some larger Java input the profiler shows that only 10% of all rule applications could have been saved by a packrat parser. This means that, given a perfect packrat engine without any overhead whatsoever, the parser would only run about 10% faster. Now, the Java example might be a special one, since its grammar is likely quite optimized. As another example one might take a look at the Markdown parser grammar also available for parboiled. This grammar is not optimized at all for speed and depends a lot on PEG syntactic predicates, which are probably one of the main sources for rule reapplications. However, even with this grammar the number of "packrattable" rule applications amounts to only about 50%. This means the parser would only run about twice as fast with perfect packratting. The cost of managing and querying the rule cache outweighs this potential benefit in all packrat implementations I have looked at so far... Admittedly, in my tests there were a few cases where one might achieve a small speedup of a few percent when limiting the memoization to rule mismatches (which usually greatly outnumber the "packrattable" rule _matches_) and crafting the cache lookup very carefully (e.g. by only memoizing the index of the last one or two mismatches). However, IMHO this does not count as the type of "packrat parsing" usually advertised on the feature list of a parsing engine. Do you have any guess (or even data) on how many rule applications can be saved by memoization in the case of the PEG C grammar you recently looked at? What was the speedup of enabling memoization you experienced? Cheers, Mathias --- math...@parboiled.org http://www.parboiled.org On 20.10.2010, at 17:59, Terence Parr wrote: > On Oct 20, 2010, at 12:05 AM, Mathias wrote: >> Concerning this issue with packrat parsing (rather than backtracking): >> parboiled does not do packrat parsing but rather concentrates on providing a >> really efficient backtracking engine, since I have yet to see a real-word >> example where packratting does actually increase parsing performance over an >> efficient parser without the overhead of creating and managing the rule >> match cache. >> If anyone has such an example I would be very interested... > > Hi Mathias, > > I have a number of large examples that won't terminate in my lifetime (though > I'm getting old!) w/o the memoize=true option in ANTLR in grammars that > backtrack a lot. Turning on/off means finishing quickly or not terminating. > Efficient parsers are a constant in O(...) whereas the memoize reduces an > exponential to a linear problem, with a bigger constant. In my experience, > memoization can be a necessity depending on the grammar. I think it was a > PEG C grammar I converted to ANTLR a few days ago that needed memoizing. > 'course one can always work on the grammar to reduce backtracking needs. > > ANTLR's approach to PEG parsing is to optimize away the speculation as much > as possible to attenuate this issue with actions. > > Ter > _______________________________________________ > PEG mailing list > PEG@lists.csail.mit.edu > https://lists.csail.mit.edu/mailman/listinfo/peg _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg