On Sat, May 11, 2019 at 6:09 PM Nicolas Laurent < nicolas.laur...@uclouvain.be> wrote: > Without memoization, a LL(1) language trivially admits an O(n) PEG grammar, which should generally look very much like the LL(1) grammar (I think they might even be provably identical, but I haven't thought this through, so don't quote me on this).
We have discussed (and proved) this correspondence here: https://www.sciencedirect.com/science/article/pii/S0167642314000276 https://arxiv.org/abs/1304.3177 > - There exists a set of problematic grammar patterns that have non-exponential complexity but are very slow nonetheless. These all have to do with parsing things that look like binary operators. Things are even worse when some kind of left-recursion support is present. Without entering the details (which I can supply if requested), you get into a bad backtracking pattern and you incur things like O(2^(num of operators)) to parse a "primitive expression". So to ground this, in Java this is ~ 2^10 parser invocation to parse a mere integer. It's easy to fall into these bad patterns, because some of them are the straight transposition of how operators are handled in CFGs. My own solution is to supply explicit parser combinators to enable parsing left- / right-associative expressions. > > What I'm getting at here is that it seems that if you map out the potential pitfalls that give bad performance and supply tools to replace them, the risk of accidentally incurring catastrophic performance reduces a lot. > Static (and potentially heuristic) analysis of grammars could also help a lot here. I think it would be valuable if you could present in more detail which are these problematic patterns, this would help users of PEG-based tools to avoid them. -- Sérgio
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg