On Sun, Jul 31, 2011 at 02:22:35PM -0700, Dustin Voss wrote:
> It is always safe to dump memos, so long as you are prepared to recreate them 
> or read directly from the input. Memoization just speeds things up and 
> probably makes algorithmic analysis more tractable. You ask about dumping 
> memos, but really what you want to do is reduce the memory footprint, and 
> there are several ways to accomplish that while keeping some of the speed-up 
> that memos give you.
> 
> The simplest option is to use a cache window. Drop results for earlier 
> positions as you examine later positions and drop results for later positions 
> if you have to backtrack to an earlier position. Consider it like this: 
> either your early matches are accurate, or they leave you on the wrong track. 
> If the former, you don’t need to keep information for earlier positions, and 
> if the latter, by the time you get on the right track the information for 
> later positions is likely to be inapplicable anyway.
> 
> Another option is to cache the results of only certain rules, skipping ones 
> that are not likely to be reexamined. There was a paper that indicated that 
> selective caching actually resulted in faster performance than ubiquitous 
> caching; I forget whether that paper included selection criteria.

Did you mean original Ford's paper(virtual inlining) or something else?
> 
> A third option is to employ some sort of sparse or lazy cache. Perhaps you 
> can initially assume that no results at any input position need to be cached, 
> but if you ever revisit a position, assume that you’ll revisit it again in 
> the future and start caching results for it.
> 
> What I would like is some sort of probabilistic evaluation of a grammar. 
> Something that tell me, "how likely is it that the results of this rule at 
> such-and-such a position would be referenced during backtracking?" Is there 
> research along those lines?

Well this is possible with profiling information and enough time for compiling.
Initialy you memoize everything. At every step you run parser on test
input and then drop rule with lowest memo hit ratio. 
Run it until ratio goes above say 2.


> 
> 
> On Jul 2, 2011, at 12:50 PM, Francisco Tolmasky wrote:
> 
> > Hi,
> > 
> > I've only recently started working on PEG parsers (you can see the one I'm 
> > working on here: http://github.com/tolmasky/language/ ). I was wondering if 
> > there were any recommendations for good papers or other material relating 
> > to when it is safe to dump "memos" in a memoizing PEG parser. For example, 
> > in the following grammar:
> > 
> > start = A A
> > A = something / something_else
> > 
> > After successfully parsing through the first "A", it is safe to dump some 
> > entries from the memoization table since you know for sure that you will 
> > not be backtracking to them ever again. Unfortunately as far as I can tell 
> > this optimization can only be used when none of your ancestor rules are 
> > ordered choice rules (or I suppose any rule that may backtrack), so it can 
> > only be applied occasionally. I've read up a little and have a preliminary 
> > implementation for cut support ( 
> > http://www.ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf
> >  ) but I am looking to see if there are any simpler heuristics like the one 
> > I showed above that don't require me to change the grammar.
> > 
> > Any help is greatly appreciated.
> > 
> > Thanks,
> > 
> > Francisco
> > 
> > 
> > _______________________________________________
> > PEG mailing list
> > PEG@lists.csail.mit.edu
> > https://lists.csail.mit.edu/mailman/listinfo/peg
> 
> 
> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg

-- 

It's union rules. There's nothing we can do about it. Sorry.

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to