Terence,

thanks for your comment on this.
I understand the difference in worst-case complexity between a non-memoizing 
PEG parser and one performing memoization. Of course one can construct example 
grammars exhibiting an excessive amount of backtracking.
However, my hypothesis is that there is no "real-world" grammar, i.e. a grammar 
for some language actually used for business applications, that actually 
benefits from a packrat parsing approach. Given a somewhat efficient parsing 
engine (the Scala Parser Combinators for example are NOT a good example for 
this) the constant factors introduced by the overhead of memoization kill any 
benefit from the reduction of rule matches.

I have some data to back this up:
The parboiled PEG parsing library comes with a "ProfilingParseRunner" that, 
among other things, counts rule matches and mismatches during the runtime of 
the parser against a given input. It does this globally, for each individual 
rule as well as for "rematches" and "remismatches", i.e. cases where the rule 
application could have been "saved" by a packrat parser.

When run against parboileds Java grammar example and some larger Java input the 
profiler shows that only 10% of all rule applications could have been saved by 
a packrat parser. This means that, given a perfect packrat engine without any 
overhead whatsoever, the parser would only run about 10% faster.
Now, the Java example might be a special one, since its grammar is likely quite 
optimized.

As another example one might take a look at the Markdown parser grammar also 
available for parboiled. This grammar is not optimized at all for speed and 
depends a lot on PEG syntactic predicates, which are probably one of the main 
sources for rule reapplications.
However, even with this grammar the number of "packrattable" rule applications 
amounts to only about 50%. This means the parser would only run about twice as 
fast with perfect packratting.

The cost of managing and querying the rule cache outweighs this potential 
benefit in all packrat implementations I have looked at so far...

Admittedly, in my tests there were a few cases where one might achieve a small 
speedup of a few percent when limiting the memoization to rule mismatches 
(which usually greatly outnumber the "packrattable" rule _matches_) and 
crafting the cache lookup very carefully (e.g. by only memoizing the index of 
the last one or two mismatches).
However, IMHO this does not count as the type of "packrat parsing" usually 
advertised on the feature list of a parsing engine.

Do you have any guess (or even data) on how many rule applications can be saved 
by memoization in the case of the PEG C grammar you recently looked at? What 
was the speedup of enabling memoization you experienced?

Cheers,
Mathias

---
math...@parboiled.org
http://www.parboiled.org

On 20.10.2010, at 17:59, Terence Parr wrote:

> On Oct 20, 2010, at 12:05 AM, Mathias wrote:
>> Concerning this issue with packrat parsing (rather than backtracking):
>> parboiled does not do packrat parsing but rather concentrates on providing a 
>> really efficient backtracking engine, since I have yet to see a real-word 
>> example where packratting does actually increase parsing performance over an 
>> efficient parser without the overhead of creating and managing the rule 
>> match cache.
>> If anyone has such an example I would be very interested... 
> 
> Hi Mathias,
> 
> I have a number of large examples that won't terminate in my lifetime (though 
> I'm getting old!) w/o the memoize=true option in ANTLR in grammars that 
> backtrack a lot.  Turning on/off means finishing quickly or not terminating.  
> Efficient parsers are a constant in O(...) whereas the memoize reduces an 
> exponential to a linear problem, with a bigger constant.  In my experience, 
> memoization can be a necessity depending on the grammar.  I think it was a 
> PEG C grammar I converted to ANTLR a few days ago that needed memoizing.  
> 'course one can always work on the grammar to reduce backtracking needs.
> 
> ANTLR's approach to PEG parsing is to optimize away the speculation as much 
> as possible to attenuate this issue with actions. 
> 
> Ter
> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to