> I'd be very interested to hear of your experiences if/when you try to  
> add memoization.

The main goal for metapeg is to write a very simple implementation. The
scheme code and grammar combined come to about 200 lines of code.

Perhaps I might add memoization in the future, but taking into account
your experiences and Redziejowski's paper, I think I might start looking
at simpler optimizations first.

"The experiment confirmed the author’s expectations. The parser exhibits 
a moderate backtracking activity. The parsing ”time”, expressed as the 
number of procedure calls, grows linearly with the input size. A significant
improvement could be achieved by saving results of one or two most recent 
calls to each parsing procedure. (A thrifty mouse instead of a pack rat.) 
This modification could be comfortably hidden behind the scenes without 
destroying the clarity of the primitive parser."

This sounds very much like your observations about using a simple inline
cache for method lookup in "Open, reusable object models".

My intuition is that for typical programming languages there will not be much 
backtracking. 
I have done a little bit of work with the PEG grammar for the contructed 
language lojban, I didn't make notes but I recall a fairly significant speedup 
when adding memoization.
Whether a similar gain would have been achieved with just saving the result of 
each parsing procedure needs to be checked.

Unlike your C parser, there tend to be very few large datasets to use for 
testing with the kind of parsers I have been generating.

What do you plan to use the C parser for, will it be invoked repeatedly (like 
the BASIC interpreter)?

John

> It's probably just my ineptitude at optimisation but when  
> implementing the peg/leg parser generators for C, every single  
> optimisation I tried to incorporate ended up slowing down the  
> generated parsers.  I paid some attention to making the input buffer,  
> backup, and the paths through conditionals as efficient as possible.   
> After that I think it came down to the speed at which I could match  
> strings and character classes.
> 
> I fed a C99 grammar (copied from the standard, with left-recursion  
> rewritten) through 'leg' and then pointed the generated parser at a  
> huge pile of C source; it parsed about 3 megabytes (4600+ function  
> definitions and external declarations, 95000 lines of code) per  
> second, with no optimisations whatsoever.  (2.16 GHz Core 1 Duo  
> running Darwin.)  The 'basic' example (included in the 'peg' source)  
> interprets Tiny BASIC by (re-)parsing every line as source as it  
> executes, putting the execution machinery in semantic actions, and it  
> manages over 1/2 million lines per second (same hardware as above).
> 
> Roman Redziejowski wrote a paper (linked from Bryan's packrat page)  
> about backtracking with little or no optimisation that I am tempted  
> to read for possible insights into my 'degrading experiences' with  
> optimisations.
> 
> Cheers,
> Ian
> 
> 
> 


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to