Re: [PEG] Bounds for PEG for a language known to be LL(1)

Juancarlo Añez Mon, 20 May 2019 19:51:18 -0700

Hello Nicolas,

Thanks for your thoughtful reply!


First of all, with full memoization, PEG is O(n) for all PEG languages.
> This is in the original Ford paper, but basically, at most you'll invoke
> every parsing expression at every position so O(n*grammar_size) where
> grammar_size is a constant that depends on the grammar.
>

I have to revisit that paper, because my experience seems to indicate is
that O(n*g) is likely for input that belongs to the language, but worse
when the input does not.

Beware that full-memoization has been reported to be slower than no
> memoization at all on most language, with the best performance resulting
> from memoizing only a few critical rules.
>

You'd need to qualify "most languages". Again, in my experience, it depends
on how the grammar is written. I've opted for grammars that are easy to
write and maintain, and that are easy to work with for producing useful
ASTs, so they tend to contain many repeated expressions.


> Without memoization, a LL(1) language trivially admits an O(n) PEG
> grammar, which should generally look very much like the LL(1) grammar (I
> think they might even be provably identical, but I haven't thought this
> through, so don't quote me on this).
>

I won't quote you, because the eagerness in PEG makes choice ordering a
must, so some amount of tweaking is required before an LL(1) grammar can be
used with PEG. It should be provable, though, that the PEG grammar with
just choice ordering is still LL(1).


> A more interesting question would be if it's possible to derive a O(n) PEG
> grammar for any PEG grammar for a LL(1) language.
> I'm not sure. And if it is possible (via a left-factoring process), then I
> really doubt it's possible without risking getting a new grammar that is
> exponentially larger than the original (even if a small version exists!).
>

A trivial translation of LL(k) to PEG would be to place the LL(k)
lookaheads as PEG lookaheads in front of each choice at O(k*g).


> - It's almost impossible to involuntarily write an exponential grammar in
> PEG. I had trouble coming up with examples of exponential grammars.
> Necessarily, these will look like puzzles. If you write one of these, you
> deserve whatever happens to you.
>

I think it depends on the language and the input. Remember that I've parsed
COBOL and NaturalAG with PEG. Incorrect input with many ambiguous
constructs makes the parsers take much longer than with correct input.
Sources of incorrect input? Variations in the language among vendors, for
example.


> - There exists a set of problematic grammar patterns that have
> non-exponential complexity but are very slow nonetheless. These all have to
> do with parsing things that look like binary operators.
>

Yes.


> Things are even worse when some kind of left-recursion support is present.
> Without entering the details (which I can supply if requested), you get
> into a bad backtracking pattern and you incur things like O(2^(num of
> operators)) to parse a "primitive expression". So to ground this, in Java
> this is ~ 2^10 parser invocation to parse a mere integer. It's easy to fall
> into these bad patterns, because some of them are the straight
> transposition of how operators are handled in CFGs. My own solution is to
> supply explicit parser combinators to enable parsing left- /
> right-associative expressions.
>

My experience is that it is best to avoid recursion in PEG grammars unless
the construct being parsed is recursive.

addition = term '+' addition / term

Behaves differently from:

addition = term ('+' term)*


> What I'm getting at here is that it seems that if you map out the
> potential pitfalls that give bad performance and supply tools to replace
> them, the risk of accidentally incurring catastrophic performance reduces a
> lot.
> Static (and potentially heuristic) analysis of grammars could also help a
> lot here.
>

Yes.


> As for Python: there is absolutely no reason to doubt it's not feasible to
> do a Python PEG. The real question is about the performance of the
> resulting parser compared to LL(1) and LL(k) — not in terms of complexity,
> but rather in terms of raw metal speed. Table-based lookup does tend to be
> faster than trying out every alternative (even if the PEG is written in a
> LL(1) style, and if not then it will almost certainly lose out). But then
> again, LL(1) is ugly and hard to maintain so that's a trade-off.
>

That is so. I'm exploring PEG for Python because the maintainers of it's
LL(1) parser expressed concerns about the inconvenience of maintaining an
LL(1) grammar. But bare-metal speed is a requirement, so
LL(not-too-small-k) may be the best solution (a problem is that most LL
parser generators for C seem abandoned). I'll keep working on the PEG
grammar for Python just to be able to measure.

-- 
Juancarlo *Añez*

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Bounds for PEG for a language known to be LL(1)

Reply via email to