[Python-Dev] Reading Python source file

Serhiy Storchaka Mon, 16 Nov 2015 17:55:58 -0800

I'm working on rewriting Python tokenizer (in particular the part thatreads and decodes Python source file). The code is complicated. For nowthere are such cases:


* Reading from the string in memory.
* Interactive reading from the file.
* Reading from the file:
  - Raw reading ignoring encoding in parser generator.
  - Raw reading UTF-8 encoded file.
  - Reading and recoding to UTF-8.

The file is read by the line. It makes hard to check correctness of thefirst line if the encoding is specified in the second line. And it makesvery hard problems with null bytes and with desynchronizing buffered Cand Python files. All this problems can be easily solved if read allPython source file in memory and then parse it as string. This wouldallow to drop a large complex and buggy part of code.

Are there disadvantages in this solution? As for memory consumption, thesource text itself will consume only small part of the memory consumedby AST tree and other structures. As for performance, reading anddecoding all file can be faster then by the line.


[1] http://bugs.python.org/issue25643

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Reading Python source file

Reply via email to