On 4/24/06, Michael Foord <[EMAIL PROTECTED]> wrote: > Hello all, > > I'm working on a parser for part of the Python language (expressions but > not statements basically). I'm using PLY to generate the parser and it's > mostly done. > > I've hit on what looks like a fundamental ambiguity in the Python > grammar which is difficult to get round with PLY; and I'm wondering > *why* the grammar is defined in this way. It's possible there is a > reason that I've missed, which means I need to rethink my workaround. > > List displays (list comprehensions) are defined as (from > http://docs.python.org/ref/lists.html ) : >
Have you checked Grammar/Grammar to make sure it is the same? The grammar in the language ref is not always the most up-to-date version of the grammar. For instance, the grammar rule for 'test' is actually ``test: or_test ['if' or_test 'else' test] | lambdef``. -Brett > > test ::= and_test ( "or" and_test )* | lambda_form > testlist ::= test ( "," test )* [ "," ] > list_display ::= "[" [listmaker] "]" > listmaker ::= expression ( list_for | ( "," expression )* [","] ) > list_iter ::= list_for | list_if > list_for ::= "for" expression_list "in" testlist [list_iter] > list_if ::= "if" test [list_iter] > > The problem is that list_for is defined as : > > "for" expression_list "in" testlist > > This allows arbitrary expressions in the 'assignment' part of a list > comprehension. > > As a result, the following is valid syntax according to the grammar : > > [x for x + 1 in y] > > Obviously it isn't valid ! This parses to an ast, but the syntax error > is thrown when you compile the resulting ast. > > The problem is that for the basic case of a list comprehension ( ``[x > for x in y]``), ``x in y`` is a valid expression. That makes it > extremely hard to disambiguate the grammar so that the ``in`` is treated > correctly, and not part of an expression. > > My question is, why are arbitrary expressions allowed here in the > grammar ? As far as I can tell, only identifiers (nested in parentheses > or brackets) are valid here. I've got round the problem by creating a > new node 'identifier_list' and just having that return the expected > syntax tree (actually an expression list). This gets round the ambiguity > [#]_. > > It worries me that there might be a valid expression allowed here that I > haven't thought of. My current rules allow anything that looks like > ``(a, [b, c, (d, e)], f)`` - any nested identifier list. Would anything > else be allowed ? > > If not, why not modify the grammar so that the compiler has less > possible invalid syntax trees to work with ? > > (Also the grammar definition of string conversion is wrong as it states > that a trailing comma is valid, which isn't the case. As far as I can > tell that is necessary to allow nesting string conversions.) > > Fuzzyman > http://www.voidspace.org.uk/python/index.shtml > > > .. [#] If I could make precedence work in PLY I could also solve it I > guess. However I can't. :-) > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com