[Python-Dev] Re: PEP 617: New PEG parser for CPython

Guido van Rossum Thu, 02 Apr 2020 19:30:10 -0700

 Le ven. 3 avr. 2020 à 02:58, Greg Ewing <greg.ew...@canterbury.ac.nz> a
écrit :
> On 3/04/20 10:33 am, Victor Stinner wrote:
> > I also like the fact that PEG is deterministic, whereas
> > LL(1) parsers are not.
>
> Where do you get that LL(1) parsers are not deterministic?
> That's news to me!

On Thu, Apr 2, 2020 at 6:15 PM Victor Stinner <vstin...@python.org> wrote:

> Sorry, I was referring to *ambiguous* grammar rules. Extract of the PEP:
>
> "Unlike LL(1) parsers PEG-based parsers cannot be ambiguous: if a
> string parses, it has exactly one valid parse tree. This means that a
> PEG-based parser cannot suffer from the ambiguity problems described
> in the previous section."
>

Maybe we need to rephrase this a bit. It's more that the LL(1) and PEG
formalisms deal very different with ambiguous *grammars*. An example of an
ambiguous grammar would be:

start: X | Y
X: expr
Y: expr
expr: NAME | NAME '+' NAME

There are probably better examples of ambiguous grammars (see
https://en.wikipedia.org/wiki/Ambiguous_grammar) but I think this will do
to explain the problem.

This is a fine context-free grammar (it accepts strings like "a" and "a+b")
but the LL(1) formalism will reject it because it sees an overlap in FIRST
sets between X and Y -- not surprising because they have the same RHS.
Also, even a more powerful formalism would have to make a choice whether to
choose X or Y, which may matter if the derivation is used to build a parse
tree (like Python's pgen does).

OTOH a PEG parser generator will always take the X alternative -- it
doesn't care that there's more than one derivation, since its '|' operator
is not symmetrical: X|Y and Y|X are not the same, as they are in LL(1) and
most other formalisms. (In fact, the common notation for PEG uses '/' to
emphasize this, but it looks ugly to me so I changed it to '|'.)

That PEG (by definition) always uses the first matching alternative is
actually a blessing as well as a curse. The downside is that PEG can't tell
you when have a real ambiguity in your grammar. But the upside is that it
works like a programmer would write a (recursive descent) parser. Thus it
"solves" the problem of ambiguous grammars by choosing the first
alternative. This allows more freedom in designing a grammar. For example,
it would let a language designer solve the "dangling else" problem from the
Wikipedia page, by writing the form including the "else" clause first .
(Python doesn't have that problem due to the use of indentation, but it
might appear in another disguise.)

I should probably refine this argument and include it in the PEP as one of
the reasons to prefer PEG over LR or LALR (but I need to think more about
that -- it was a very early choice).

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3LHZNXC2MJJ2RUMDGRH7L2ZOPH5ZE6QK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 617: New PEG parser for CPython

Reply via email to