Re: [PEG] Bounds for PEG for a language known to be LL(1)

Nicolas Laurent Sat, 11 May 2019 14:10:13 -0700

Hey Juancarlo,

First of all, with full memoization, PEG is O(n) for all PEG languages.
This is in the original Ford paper, but basically, at most you'll invoke
every parsing expression at every position so O(n*grammar_size) where
grammar_size is a constant that depends on the grammar.

Beware that full-memoization has been reported to be slower than no
memoization at all on most language, with the best performance resulting
from memoizing only a few critical rules.

Without memoization, a LL(1) language trivially admits an O(n) PEG grammar,
which should generally look very much like the LL(1) grammar (I think they
might even be provably identical, but I haven't thought this through, so
don't quote me on this).

The issue is that writing grammars in this style is very very backward.
Writing LL(1) grammars is miserable, in CFG or in PEG.

A more interesting question would be if it's possible to derive a O(n) PEG
grammar for any PEG grammar for a LL(1) language.
I'm not sure. And if it is possible (via a left-factoring process), then I
really doubt it's possible without risking getting a new grammar that is
exponentially larger than the original (even if a small version exists!).

Let me offer another pragmatic angle, however. After pondering the issues
in performance for PEG for a while, I came to a few conclusions:

- It's almost impossible to involuntarily write an exponential grammar in
PEG. I had trouble coming up with examples of exponential grammars.
Necessarily, these will look like puzzles. If you write one of these, you
deserve whatever happens to you.

- There exists a set of problematic grammar patterns that have
non-exponential complexity but are very slow nonetheless. These all have to
do with parsing things that look like binary operators. Things are even
worse when some kind of left-recursion support is present. Without entering
the details (which I can supply if requested), you get into a bad
backtracking pattern and you incur things like O(2^(num of operators)) to
parse a "primitive expression". So to ground this, in Java this is ~ 2^10
parser invocation to parse a mere integer. It's easy to fall into these bad
patterns, because some of them are the straight transposition of how
operators are handled in CFGs. My own solution is to supply explicit parser
combinators to enable parsing left- / right-associative expressions.

What I'm getting at here is that it seems that if you map out the potential
pitfalls that give bad performance and supply tools to replace them, the
risk of accidentally incurring catastrophic performance reduces a lot.
Static (and potentially heuristic) analysis of grammars could also help a
lot here.

As for Python: there is absolutely no reason to doubt it's not feasible to
do a Python PEG. The real question is about the performance of the
resulting parser compared to LL(1) and LL(k) — not in terms of complexity,
but rather in terms of raw metal speed. Table-based lookup does tend to be
faster than trying out every alternative (even if the PEG is written in a
LL(1) style, and if not then it will almost certainly lose out). But then
again, LL(1) is ugly and hard to maintain so that's a trade-off.

Cheers,

Nicolas LAURENT

On Sat, 11 May 2019 at 13:23, Juancarlo Añez <apal...@gmail.com> wrote:

>
> My intention is not to spam this low-volume group. There are discussions
> taking place about next-generation parsing for Python, one of the
> mainstream programming languages of our time, and I'm summoning the
> expertise I have access to.
>
> https://discuss.python.org/t/preparing-for-new-python-parsing/
>
> It should be provable that PEG memoization is O(Output) for a language
> known to be LL(1). It should be the same for runtime complexity. Are there
> any papers that settle this?
>
> Thanks in advance,
>
> --
> Juancarlo *Añez*
> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg
>

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Bounds for PEG for a language known to be LL(1)

Reply via email to