On 17.11.2015 02:53, Serhiy Storchaka wrote:
> I'm working on rewriting Python tokenizer (in particular the part that reads 
> and decodes Python
> source file). The code is complicated. For now there are such cases:
> 
> * Reading from the string in memory.
> * Interactive reading from the file.
> * Reading from the file:
>   - Raw reading ignoring encoding in parser generator.
>   - Raw reading UTF-8 encoded file.
>   - Reading and recoding to UTF-8.
> 
> The file is read by the line. It makes hard to check correctness of the first 
> line if the encoding
> is specified in the second line. And it makes very hard problems with null 
> bytes and with
> desynchronizing buffered C and Python files. All this problems can be easily 
> solved if read all
> Python source file in memory and then parse it as string. This would allow to 
> drop a large complex
> and buggy part of code.
> 
> Are there disadvantages in this solution? As for memory consumption, the 
> source text itself will
> consume only small part of the memory consumed by AST tree and other 
> structures. As for performance,
> reading and decoding all file can be faster then by the line.

A problem with this approach is that you can no
longer fail early and detect indentation errors et al. while
parsing the data (which may well come from a pipe).

Another related problem is that you have to wait for the full
input data before you can start compiling the code.

I don't think these situations are all that common, though,
so reading in the full source code before compiling it
sounds like a reasonable approach.

We use the same simplification in eGenix PyRun's emulation of
the Python command line interface and it has so far not
caused any problems.

> [1] http://bugs.python.org/issue25643

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 17 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________
2015-10-23: Released mxODBC Connect 2.1.5 ...     http://egenix.com/go85

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
                      http://www.malemburg.com/

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to