Re: [Python-Dev] Reading Python source file

Guido van Rossum Tue, 17 Nov 2015 07:26:14 -0800

On Tue, Nov 17, 2015 at 1:59 AM, M.-A. Lemburg <[email protected]> wrote:
> On 17.11.2015 02:53, Serhiy Storchaka wrote:
>> I'm working on rewriting Python tokenizer (in particular the part that reads 
>> and decodes Python
>> source file). The code is complicated. For now there are such cases:
>>
>> * Reading from the string in memory.
>> * Interactive reading from the file.
>> * Reading from the file:
>>   - Raw reading ignoring encoding in parser generator.
>>   - Raw reading UTF-8 encoded file.
>>   - Reading and recoding to UTF-8.
>>
>> The file is read by the line. It makes hard to check correctness of the 
>> first line if the encoding
>> is specified in the second line. And it makes very hard problems with null 
>> bytes and with
>> desynchronizing buffered C and Python files. All this problems can be easily 
>> solved if read all
>> Python source file in memory and then parse it as string. This would allow 
>> to drop a large complex
>> and buggy part of code.
>>
>> Are there disadvantages in this solution? As for memory consumption, the 
>> source text itself will
>> consume only small part of the memory consumed by AST tree and other 
>> structures. As for performance,
>> reading and decoding all file can be faster then by the line.
>
> A problem with this approach is that you can no
> longer fail early and detect indentation errors et al. while
> parsing the data (which may well come from a pipe).


Oh, this use case I had forgotten about. I don't know how common or
important it is though.

But more important is the interactive REPL, which parses your input
fully each time you hit ENTER.

> Another related problem is that you have to wait for the full
> input data before you can start compiling the code.

That's always the case -- we don't start compiling before we have the
full parse tree.

> I don't think these situations are all that common, though,
> so reading in the full source code before compiling it
> sounds like a reasonable approach.
>
> We use the same simplification in eGenix PyRun's emulation of
> the Python command line interface and it has so far not
> caused any problems.

Curious how you do it? I'd actually be quite disappointed if the
amount of parsing done by the standard REPL went down.

>> [1] http://bugs.python.org/issue25643

-- 
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reading Python source file

Reply via email to