If you free the memory used for the source buffer before starting code generation you should be good.
On Mon, Nov 16, 2015 at 5:53 PM, Serhiy Storchaka <storch...@gmail.com> wrote: > I'm working on rewriting Python tokenizer (in particular the part that reads > and decodes Python source file). The code is complicated. For now there are > such cases: > > * Reading from the string in memory. > * Interactive reading from the file. > * Reading from the file: > - Raw reading ignoring encoding in parser generator. > - Raw reading UTF-8 encoded file. > - Reading and recoding to UTF-8. > > The file is read by the line. It makes hard to check correctness of the > first line if the encoding is specified in the second line. And it makes > very hard problems with null bytes and with desynchronizing buffered C and > Python files. All this problems can be easily solved if read all Python > source file in memory and then parse it as string. This would allow to drop > a large complex and buggy part of code. > > Are there disadvantages in this solution? As for memory consumption, the > source text itself will consume only small part of the memory consumed by > AST tree and other structures. As for performance, reading and decoding all > file can be faster then by the line. > > [1] http://bugs.python.org/issue25643 > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com