On Tue, 14 Jan 2020 at 20:43, Guido van Rossum <gu...@python.org> wrote:
>
> On the subject of replacing the current parser, I am actively working on 
> that. See GitHub.com/gvanrossum/pegen.

Sounds interesting!
I could open you a ticket, if something like that is not implemented /
in your plans currently.

>
> On Tue, Jan 14, 2020 at 10:32 Andrew Barnert via Python-ideas 
> <python-ideas@python.org> wrote:
>>
>> On Jan 14, 2020, at 05:22, Σταύρος Ντέντος <stde...@gmail.com> wrote:
>> >
>> > Hello there,
>> >
>> > If I have simply missed a double colon starting a for loop
>> >
>> >  File "./bbq.py", line 160
>> >    for config_file in config_files
>> >                                  ^
>> > SyntaxError: invalid syntax
>> >
>> > the message is not as straightforward.
>>
>> I think almost everyone would prefer it if the compiler could say 
>> “SyntaxError: missing colon at end of a compound statement header” or 
>> something more useful.
>>
>> And that probably goes even more for this case:
>>
>>     spam = eggs(cheese, (foo, bar)
>>     cheese = spam*2
>>
>> The problem is to come up with a rule that could be applied to detect these 
>> cases given the information the simple LR(1) parser has available at the 
>> time of failure. I suspect there’s no way to do that without radically 
>> changing the parser architecture, keeping track of a lot more state, or 
>> partially re-parsing things in the error handler. (If it were easy, Guido 
>> would have done it back in 1.x.)

In full disclosure, parsers are a very distant, back-in-time memory
from my university courses.
A dark one, but intriguing none-the-less.

It appears to me that your case is slightly different than mine
(equally annoying, for sure):
# python3 test.py
  File "test.py", line 6
    cheese = spam*2
         ^
SyntaxError: invalid syntax

In this case, the parser will also have to backtrack a symbol, and
then "attempt to guess" what could be wrong, from a handful of cases.
Not entirely complicated (given that you would, at most, backtrack one
symbol), but less straightforward than being exactly at the point of
violation.

>> But maybe there’s a way to heuristically detect that these problems are 
>> _likely_ causes of the error (without having to be as ridiculously 
>> complicated as what Clang does with C++ code)? If you could find a way to 
>> make the error say “SyntaxError: invalid syntax (possibly missing colon at 
>> end of compound statement header)” in most simple “forgot the colon” cases 
>> and very few other cases, without massively disrupting everything, I think 
>> people would be happy with that.
>>
>> You might even be able to take advantage of re-parsing without having to 
>> solve all the problems that go with that. For example, technically, you 
>> can’t even access the last logical line to reparse; practically, you can get 
>> it in the same cases the traceback can print it, and those are probably the 
>> only cases you need to heuristically improve the error handling. You could 
>> even maybe do a quick & dirty proof of concept in Python in an import hook, 
>> if you don’t want to dive into the middle of the C compiler code.
>>
>> As an alternative, there are lots of projects to use more powerful parser 
>> algorithms on Python. There’s not much call to replace CPython’s parser, 
>> because there aren’t any benefits to offset the costs. (At least assuming 
>> that the language is going to stay LR(1), to make it easy to parse in your 
>> head.) But if you could improve most of the most annoying error handling 
>> cases, that might be a different story. And these might also be easier to 
>> play with. (Some have pure Python implementations, and even the ones in C 
>> aren’t embedded in the middle of the compiler code.) IIRC, early Java did 
>> something clever with a GLR parser that has LR(1) performance on all valid 
>> code and strictly bounded complexity on error recovery (so it may get as bad 
>> as worst-case cubic, but cubic on N<=5 so who cares) so they could usually 
>> produce error messages as good as most C compilers without the horrible mess 
>> of parsing that most C compilers need.

Exactly because I am not probably the only one that has thought about
it (I would guess python community is a bit older than me), and
considering that my parsing knowledge is very limited, I did not try
to suggest anything.
Dummy heuristics, limited scope, and suggestive language ("maybe
missing double colon?") are all welcome parameters against plain
"SyntaxError: invalid syntax" messages.

I do recognise that it was a very .... "senior moment" of mine (that
it took me enough to nudge me to nudge others); however, small nuances
like this one sometimes can push people over the edge, whereas the
solution is much more trivial and obvious than whatever

>>
>> _______________________________________________
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at 
>> https://mail.python.org/archives/list/python-ideas@python.org/message/ILJNAN4E5VROSODWO2UWJDHP5DCVM56G/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
> --Guido (mobile)

--
Ντέντος Σταύρος
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FQZNCTH6YUTRVISSIR5LSYMXMOWRT5M6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to