New issue 2715: Memory from JSON-deserialized objects isn't reclaimed https://bitbucket.org/pypy/pypy/issues/2715/memory-from-json-deserialized-objects-isnt
Artur Siekielski: I encountered the issue while processing large numbers of JSON documents in batches. When processing a batch is finished, ie. the documents were deserialized and references to them are cleared, the memory isn't reclaimed and is always growing. I was able to reproduce the issue using the attached code. The code json-loads documents that are varying in size in batches of 100. When the batch is processed, the loaded documents are cleared and gc.collect() is called. The script prints the amount of used memory. I get the following output using PyPy2 5.9/5.10: ``` #!python initial 114984 before clearing 1546032 after clearing and gc 1546032 before clearing 1561080 after clearing and gc 1561344 before clearing 1561608 after clearing and gc 1522916 before clearing 1559608 after clearing and gc 1560664 ``` CPython 2.7.14 gives the output: ``` #!python initial 89212 before clearing 2303832 after clearing and gc 153352 before clearing 2304968 after clearing and gc 153352 before clearing 2305756 after clearing and gc 153352 before clearing 2306736 after clearing and gc 153352 before clearing 2307520 after clearing and gc 153352 ``` The default function returning a JSON document in the code is gen_doc_1. It generates some random document with nested dicts and arrays. When it's replaced with gen_doc_2 which return's an array of ints the issue isn't present. I tried disabling JIT and played with controlling the GC with env. variables, but that didn't make any difference. _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue