On Tue, 14 Jan 2020 at 20:43, Guido van Rossum <gu...@python.org> wrote: > > On the subject of replacing the current parser, I am actively working on > that. See GitHub.com/gvanrossum/pegen.
Sounds interesting! I could open you a ticket, if something like that is not implemented / in your plans currently. > > On Tue, Jan 14, 2020 at 10:32 Andrew Barnert via Python-ideas > <python-ideas@python.org> wrote: >> >> On Jan 14, 2020, at 05:22, Σταύρος Ντέντος <stde...@gmail.com> wrote: >> > >> > Hello there, >> > >> > If I have simply missed a double colon starting a for loop >> > >> > File "./bbq.py", line 160 >> > for config_file in config_files >> > ^ >> > SyntaxError: invalid syntax >> > >> > the message is not as straightforward. >> >> I think almost everyone would prefer it if the compiler could say >> “SyntaxError: missing colon at end of a compound statement header” or >> something more useful. >> >> And that probably goes even more for this case: >> >> spam = eggs(cheese, (foo, bar) >> cheese = spam*2 >> >> The problem is to come up with a rule that could be applied to detect these >> cases given the information the simple LR(1) parser has available at the >> time of failure. I suspect there’s no way to do that without radically >> changing the parser architecture, keeping track of a lot more state, or >> partially re-parsing things in the error handler. (If it were easy, Guido >> would have done it back in 1.x.) In full disclosure, parsers are a very distant, back-in-time memory from my university courses. A dark one, but intriguing none-the-less. It appears to me that your case is slightly different than mine (equally annoying, for sure): # python3 test.py File "test.py", line 6 cheese = spam*2 ^ SyntaxError: invalid syntax In this case, the parser will also have to backtrack a symbol, and then "attempt to guess" what could be wrong, from a handful of cases. Not entirely complicated (given that you would, at most, backtrack one symbol), but less straightforward than being exactly at the point of violation. >> But maybe there’s a way to heuristically detect that these problems are >> _likely_ causes of the error (without having to be as ridiculously >> complicated as what Clang does with C++ code)? If you could find a way to >> make the error say “SyntaxError: invalid syntax (possibly missing colon at >> end of compound statement header)” in most simple “forgot the colon” cases >> and very few other cases, without massively disrupting everything, I think >> people would be happy with that. >> >> You might even be able to take advantage of re-parsing without having to >> solve all the problems that go with that. For example, technically, you >> can’t even access the last logical line to reparse; practically, you can get >> it in the same cases the traceback can print it, and those are probably the >> only cases you need to heuristically improve the error handling. You could >> even maybe do a quick & dirty proof of concept in Python in an import hook, >> if you don’t want to dive into the middle of the C compiler code. >> >> As an alternative, there are lots of projects to use more powerful parser >> algorithms on Python. There’s not much call to replace CPython’s parser, >> because there aren’t any benefits to offset the costs. (At least assuming >> that the language is going to stay LR(1), to make it easy to parse in your >> head.) But if you could improve most of the most annoying error handling >> cases, that might be a different story. And these might also be easier to >> play with. (Some have pure Python implementations, and even the ones in C >> aren’t embedded in the middle of the compiler code.) IIRC, early Java did >> something clever with a GLR parser that has LR(1) performance on all valid >> code and strictly bounded complexity on error recovery (so it may get as bad >> as worst-case cubic, but cubic on N<=5 so who cares) so they could usually >> produce error messages as good as most C compilers without the horrible mess >> of parsing that most C compilers need. Exactly because I am not probably the only one that has thought about it (I would guess python community is a bit older than me), and considering that my parsing knowledge is very limited, I did not try to suggest anything. Dummy heuristics, limited scope, and suggestive language ("maybe missing double colon?") are all welcome parameters against plain "SyntaxError: invalid syntax" messages. I do recognise that it was a very .... "senior moment" of mine (that it took me enough to nudge me to nudge others); however, small nuances like this one sometimes can push people over the edge, whereas the solution is much more trivial and obvious than whatever >> >> _______________________________________________ >> Python-ideas mailing list -- python-ideas@python.org >> To unsubscribe send an email to python-ideas-le...@python.org >> https://mail.python.org/mailman3/lists/python-ideas.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-ideas@python.org/message/ILJNAN4E5VROSODWO2UWJDHP5DCVM56G/ >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > --Guido (mobile) -- Ντέντος Σταύρος _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FQZNCTH6YUTRVISSIR5LSYMXMOWRT5M6/ Code of Conduct: http://python.org/psf/codeofconduct/