On 17.11.15 05:00, Guido van Rossum wrote:
If you free the memory used for the source buffer before starting code
generation you should be good.
Thank you. The buffer is freed just after the end of generating AST.
On Mon, Nov 16, 2015 at 5:53 PM, Serhiy Storchaka <storch...@gmail.com> wrote:
I'm working on rewriting Python tokenizer (in particular the part that reads
and decodes Python source file). The code is complicated. For now there are
such cases:
* Reading from the string in memory.
* Interactive reading from the file.
* Reading from the file:
- Raw reading ignoring encoding in parser generator.
- Raw reading UTF-8 encoded file.
- Reading and recoding to UTF-8.
The file is read by the line. It makes hard to check correctness of the
first line if the encoding is specified in the second line. And it makes
very hard problems with null bytes and with desynchronizing buffered C and
Python files. All this problems can be easily solved if read all Python
source file in memory and then parse it as string. This would allow to drop
a large complex and buggy part of code.
Are there disadvantages in this solution? As for memory consumption, the
source text itself will consume only small part of the memory consumed by
AST tree and other structures. As for performance, reading and decoding all
file can be faster then by the line.
[1] http://bugs.python.org/issue25643
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com