On 17.11.2015 02:53, Serhiy Storchaka wrote: > I'm working on rewriting Python tokenizer (in particular the part that reads > and decodes Python > source file). The code is complicated. For now there are such cases: > > * Reading from the string in memory. > * Interactive reading from the file. > * Reading from the file: > - Raw reading ignoring encoding in parser generator. > - Raw reading UTF-8 encoded file. > - Reading and recoding to UTF-8. > > The file is read by the line. It makes hard to check correctness of the first > line if the encoding > is specified in the second line. And it makes very hard problems with null > bytes and with > desynchronizing buffered C and Python files. All this problems can be easily > solved if read all > Python source file in memory and then parse it as string. This would allow to > drop a large complex > and buggy part of code. > > Are there disadvantages in this solution? As for memory consumption, the > source text itself will > consume only small part of the memory consumed by AST tree and other > structures. As for performance, > reading and decoding all file can be faster then by the line.
A problem with this approach is that you can no longer fail early and detect indentation errors et al. while parsing the data (which may well come from a pipe). Another related problem is that you have to wait for the full input data before you can start compiling the code. I don't think these situations are all that common, though, so reading in the full source code before compiling it sounds like a reasonable approach. We use the same simplification in eGenix PyRun's emulation of the Python command line interface and it has so far not caused any problems. > [1] http://bugs.python.org/issue25643 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Nov 17 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-10-23: Released mxODBC Connect 2.1.5 ... http://egenix.com/go85 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com