Not a bad idea - quite simple and elegant.  A few potential issues:

1. There may be some immediate performance cost to trying to parse everything at least twice, to see if it changes after the first iteration, before entering it into the memo table. But presumably that cost could easily be minimized in a smart parser generator by performing static left recursion detection and only emitting loops in parse functions that are potentially left-recursive. (I expect Chris's Katahdin already does this...?)

2. As currently formulated, it's not obvious to me that with this modification the parser is guaranteed to terminate: although the "expected case" is obviously that the amount of text "consumed" by the looping construction should increase at each iteration, it should be easy to create a grammar that yields success if the prior iteration failed and vice versa, resulting in an infinite loop. Presumably this could be fixed trivially by adding a hard-coded restriction that the amount of text consumed actually does increase with each iteration - e.g., use 'cur != -1 && len(cur) > len(prev)' as the predicate of the 'while' loop.

3. Even after the above modification, it's not obvious to me that this modification would preserve the linear parse time guarantee of the basic PEG model, which has interesting implications that could be either positive or negative. There now appears to be an additional "dimension" of looping - each parse rule at each of the n text positions can potentially be evaluated O(n) times, so it looks like we're up to O(n^2) complexity in theory, the same as an Early parse of an unambiguous CFG for example. Which suggests the question: does this modification bring PEGs closer to unambiguous CFGs in some formal way? For example, with this modification can we come up with a "middle-finding" PEG corresponding to the CFG "S -> [ab] S [ab] | a"?

Cheers,
Bryan

On Sep 3, 2007, at 6:48 AM, Jeremy Bruestle wrote:

I seem to have run across a very simple way to add support for left
recursion in PEGS.  It looks like it works to me in all cases, but
perhaps I'm missing somthing.  Without further ado...

Presume we have your standard memorizing peg parser.  In psudo-code it
might look like:

// Lookup memorized result, or do actual parsing
// return -1 on failure, length of match on success
int check_parse(rule, position)
     key = tuple(rule, postion);
     if find(memory[key])
         return memory[key];
     else
         memory[key] = do_pase(rule, position);
         return memory[key];
     end
end

// Do actual work, to parse sub-components, call check_parse
// Return result in same format as above
int do_parse(rule, position);   // Actual code would be here

Now if we have any left recursion in our rules, we go into an infinite
recursion.

We can fix the infinite recursion, although not cause our left hand
rules to actually work by just setting the memory to failure when we
realize this is our first time trying a rule, but before we call
do_parse.  that is:
     ...
     else
         memory[key] = -1;
         memory[key] = do_pase(rule, position);
         return memory[key];
     end
     ...

Of course this just effectively makes the left recursive rules
worthless.  However if change check_parse to be:

int check_parse(rule, position)
     key = tuple(rule, postion);
     if find(memory[key])
         return memory[key];
     else
         cur = -1;
         do
             memory[key] = cur;
             prev = cur;
             cur = do_pase(rule, position);
         while(cur != -1 && cur != prev)
     end
end

We now cause the first parse to pick out only non-recursive rules,
then we try again with out current 'match' for the recursive
nonterminal at the position in question.  So long as it keeps working
we keep expanding at the same location.  Once we have expanded as much
as we can, we stop and return.  If we ever try to parse that same
location for the same terminal type, we just return the memorized
version as usual.  This of course presumes that our recursive rules
come before our non-recursive one for a given type.

Anyway, it seems to work, just curious on people's thoughts.

-Jeremy

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to