Re: [PEG] O(n^2) recursive descent parsing

Tim Goodwin Fri, 20 Mar 2009 07:20:24 -0700

>> But I don't think the language x ^ n is expressible in a PEG.

Sorry. For some reason (I was in a hurry!) I'd decided that the CFG A
-> x | AA describes a language of 2, or 4, or 8, or 16, ... xs (I
think you could express this as x^2^n). Just at the moment, I'm not
inclined to pass comment on whether *that* language is expressible in
a PEG or CFG, because I'll probably get it wrong, and this is all a
bit of a digression from my original question.


>> [ exponential time for recursive descent with backup ]
>
> It seems intuitively the case to me: if the parser matches all of
> the input but the last symbol, has to backtrack all the way to the
> root, tries something else and matches all of the input but the last
> symbol, has to backtrack to the root to try something else, etc.

That sounds like the case I described in my original message. For
example, S <- Ab / Ac; A <- a*. Without memoization, we must examine
every "a" twice. But it seems to me that the complexity in this case
is a mere O(n*r), where n is the size of the input, and r is the
number of rules / alternatives that can cause backtracking.

I questioned "exponential" because I understand exponential to mean
O(2^n), which means completely, utterly infeasible. Suppose we had an
exponential algorithm that could (incredibly) parse an input of size
100 in 2 seconds. Then, an input of size 110 would take 20 minutes; an
input of size 120 would take 12 days; and an input of size 200 would
take 4e22 years, or about a trillion times the age of the Universe.

< time passes, when I really should be doing something more productive :-) >

OK, I've taken a slightly longer look at Earley's paper. He talks
about CFG "G4":

S -> AB
A -> a | Ab
B -> bc | bB | Bd

which generates the language a(b^n)cd, and is exponential for "Top
Down",  "Selective Top Down", and "Bottom Up" parsers. (It is O(n^3)
for a "Selective Bottom Up" parser, and linear for Earley's parser.)
But this grammar is ambiguous, and left-recursive, so it's not very
relevant to PEG parsing.

The CFG "G3", S -> ab | aSb, is apparently exponential for the "Bottom
Up" parser, but not the others. I can't see why, though. If the
"Bottom Up" parser is shift/reduce, surely we shift "a"s till we see a
"b", then start reducing.  (These parsers referenced by Earley are
described in a 44-year old paper, "On the relative efficiencies of
context-free grammar" by Griffiths and Petrick 1965. Unfortunately
this doesn't seem to have escaped outside the ACM paywall. I'm not
close to a library at the moment; could somebody email it to me?)

 G3 can be written as a PEG: S <- ab / aSb. Ian Piumarta's "leg"
parser generator (which compiles backtracking recursive descent - not
packrat - parsers from PEGs) turns this into a decidedly
linear-looking parser.

So my question still stands: can anybody show me a PEG that - for a
backtracking recursive descent parser - would be more complex than
O(n) to parse? Despite all the discussion of exponential complexity in
this post, that's not necessary: a mere O(n^2) would be great!

Thanks,

#t

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] O(n^2) recursive descent parsing

Reply via email to