New submission from Serhiy Storchaka: Here is preliminary patch that refactors the lowest level of Python tokenizer, reading and decoding. It splits the code on smaller simpler functions, decreases the source size by 37 lines, and fixes bugs: issue14811, issue18961, and a number of others. Added tests for most of fixed bugs (except leaks and others hardly reproducible). But the fix for other bugs can be harder, especially for issues with null byte (issue1105770, issue20115).
Many bug easily can be fixed if read all Python file in memory instead of reading it line by line. I don't know if it is acceptable. ---------- assignee: serhiy.storchaka components: Interpreter Core files: tokenize_input.patch keywords: patch messages: 254778 nosy: serhiy.storchaka priority: normal severity: normal status: open title: Python tokenizer rewriting type: behavior versions: Python 3.6 Added file: http://bugs.python.org/file41058/tokenize_input.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25643> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com