New issue 2715: Memory from JSON-deserialized objects isn't reclaimed
https://bitbucket.org/pypy/pypy/issues/2715/memory-from-json-deserialized-objects-isnt

Artur Siekielski:

I encountered the issue while processing large numbers of JSON documents in 
batches. When processing a batch is finished, ie. the documents were 
deserialized and references to them are cleared, the memory isn't reclaimed and 
is always growing.

I was able to reproduce the issue using the attached code. The code json-loads 
documents that are varying in size in batches of 100. When the batch is 
processed, the loaded documents are cleared and gc.collect() is called.

The script prints the amount of used memory. I get the following output using 
PyPy2 5.9/5.10:


```
#!python

initial 114984
before clearing 1546032
after clearing and gc 1546032
before clearing 1561080
after clearing and gc 1561344
before clearing 1561608
after clearing and gc 1522916
before clearing 1559608
after clearing and gc 1560664


```

CPython 2.7.14 gives the output:


```
#!python

initial 89212
before clearing 2303832
after clearing and gc 153352
before clearing 2304968
after clearing and gc 153352
before clearing 2305756
after clearing and gc 153352
before clearing 2306736
after clearing and gc 153352
before clearing 2307520
after clearing and gc 153352

```

The default function returning a JSON document in the code is gen_doc_1. It 
generates some random document with nested dicts and arrays. When it's replaced 
with gen_doc_2 which return's an array of ints the issue isn't present.

I tried disabling JIT and played with controlling the GC with env. variables, 
but that didn't make any difference.


_______________________________________________
pypy-issue mailing list
pypy-issue@python.org
https://mail.python.org/mailman/listinfo/pypy-issue

Reply via email to