On Tue, Nov 17, 2015 at 1:59 AM, M.-A. Lemburg <m...@egenix.com> wrote: > On 17.11.2015 02:53, Serhiy Storchaka wrote: >> I'm working on rewriting Python tokenizer (in particular the part that reads >> and decodes Python >> source file). The code is complicated. For now there are such cases: >> >> * Reading from the string in memory. >> * Interactive reading from the file. >> * Reading from the file: >> - Raw reading ignoring encoding in parser generator. >> - Raw reading UTF-8 encoded file. >> - Reading and recoding to UTF-8. >> >> The file is read by the line. It makes hard to check correctness of the >> first line if the encoding >> is specified in the second line. And it makes very hard problems with null >> bytes and with >> desynchronizing buffered C and Python files. All this problems can be easily >> solved if read all >> Python source file in memory and then parse it as string. This would allow >> to drop a large complex >> and buggy part of code. >> >> Are there disadvantages in this solution? As for memory consumption, the >> source text itself will >> consume only small part of the memory consumed by AST tree and other >> structures. As for performance, >> reading and decoding all file can be faster then by the line. > > A problem with this approach is that you can no > longer fail early and detect indentation errors et al. while > parsing the data (which may well come from a pipe).
Oh, this use case I had forgotten about. I don't know how common or important it is though. But more important is the interactive REPL, which parses your input fully each time you hit ENTER. > Another related problem is that you have to wait for the full > input data before you can start compiling the code. That's always the case -- we don't start compiling before we have the full parse tree. > I don't think these situations are all that common, though, > so reading in the full source code before compiling it > sounds like a reasonable approach. > > We use the same simplification in eGenix PyRun's emulation of > the Python command line interface and it has so far not > caused any problems. Curious how you do it? I'd actually be quite disappointed if the amount of parsing done by the standard REPL went down. >> [1] http://bugs.python.org/issue25643 -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com