Hey Juancarlo, First of all, with full memoization, PEG is O(n) for all PEG languages. This is in the original Ford paper, but basically, at most you'll invoke every parsing expression at every position so O(n*grammar_size) where grammar_size is a constant that depends on the grammar.
Beware that full-memoization has been reported to be slower than no memoization at all on most language, with the best performance resulting from memoizing only a few critical rules. Without memoization, a LL(1) language trivially admits an O(n) PEG grammar, which should generally look very much like the LL(1) grammar (I think they might even be provably identical, but I haven't thought this through, so don't quote me on this). The issue is that writing grammars in this style is very very backward. Writing LL(1) grammars is miserable, in CFG or in PEG. A more interesting question would be if it's possible to derive a O(n) PEG grammar for any PEG grammar for a LL(1) language. I'm not sure. And if it is possible (via a left-factoring process), then I really doubt it's possible without risking getting a new grammar that is exponentially larger than the original (even if a small version exists!). Let me offer another pragmatic angle, however. After pondering the issues in performance for PEG for a while, I came to a few conclusions: - It's almost impossible to involuntarily write an exponential grammar in PEG. I had trouble coming up with examples of exponential grammars. Necessarily, these will look like puzzles. If you write one of these, you deserve whatever happens to you. - There exists a set of problematic grammar patterns that have non-exponential complexity but are very slow nonetheless. These all have to do with parsing things that look like binary operators. Things are even worse when some kind of left-recursion support is present. Without entering the details (which I can supply if requested), you get into a bad backtracking pattern and you incur things like O(2^(num of operators)) to parse a "primitive expression". So to ground this, in Java this is ~ 2^10 parser invocation to parse a mere integer. It's easy to fall into these bad patterns, because some of them are the straight transposition of how operators are handled in CFGs. My own solution is to supply explicit parser combinators to enable parsing left- / right-associative expressions. What I'm getting at here is that it seems that if you map out the potential pitfalls that give bad performance and supply tools to replace them, the risk of accidentally incurring catastrophic performance reduces a lot. Static (and potentially heuristic) analysis of grammars could also help a lot here. As for Python: there is absolutely no reason to doubt it's not feasible to do a Python PEG. The real question is about the performance of the resulting parser compared to LL(1) and LL(k) — not in terms of complexity, but rather in terms of raw metal speed. Table-based lookup does tend to be faster than trying out every alternative (even if the PEG is written in a LL(1) style, and if not then it will almost certainly lose out). But then again, LL(1) is ugly and hard to maintain so that's a trade-off. Cheers, Nicolas LAURENT On Sat, 11 May 2019 at 13:23, Juancarlo Añez <apal...@gmail.com> wrote: > > My intention is not to spam this low-volume group. There are discussions > taking place about next-generation parsing for Python, one of the > mainstream programming languages of our time, and I'm summoning the > expertise I have access to. > > https://discuss.python.org/t/preparing-for-new-python-parsing/ > > It should be provable that PEG memoization is O(Output) for a language > known to be LL(1). It should be the same for runtime complexity. Are there > any papers that settle this? > > Thanks in advance, > > -- > Juancarlo *Añez* > _______________________________________________ > PEG mailing list > PEG@lists.csail.mit.edu > https://lists.csail.mit.edu/mailman/listinfo/peg >
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg