[issue26415] Out of memory, trying to parse a 35MB dict

2016-03-08 Thread STINNER Victor

STINNER Victor added the comment:

> So, apparently, it's not the nodes themselves taking up a disproportionate 
> amount of memory -- it's the heap getting so badly fragmented that 89% of its 
> memory allocation is wasted.

Yeah, the Python parser+compiler badly uses the memory allocator. See my 
"benchmark" for the memory allocator: python_memleak.py.

The classical pattern of memory fragmentation is:

* allocate a lot of small objects
* allocate a few objects
* allocate more small objects
* free *all* small objects

All objects must allocated on the heap, not mmap(). So the maximum size of a 
single object must be 128 KB (usual threshold used in malloc() to switch 
beetween the heap memory and mmap).

We can try various hacks to reduce the fragmentation, but IMHO the only real 
fix is to use a different memory allocator for the compiler and then free 
everything allocated by the parser+compiler at once.

We already have an "arena" memory allocator: Include/pyarena.h, 
Python/pyarena.c. It is already used by the parser+compiler, but it's only used 
for AST objects in practice. The parser uses the PyMem allocator API (ex: 
PyMem_Malloc).

--
nosy: +haypo
Added file: http://bugs.python.org/file42091/python_memleak.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-03-08 Thread A. Skrobov

A. Skrobov added the comment:

OK, I've now looked into it with a fresh build of 3.6 trunk on Linux x64.

Peak memory usage is about 3KB per node:

$ /usr/bin/time -v ./python -c 'import ast; ast.parse("0,"*100, 
mode="eval")'
Command being timed: "./python -c import ast; ast.parse("0,"*100, 
mode="eval")"
...
Maximum resident set size (kbytes): 3015552
...


Out of the 2945 MB total peak memory usage, only 330 MB are attributable to the 
heap use:

$ valgrind ./python -c 'import ast; ast.parse("0,"*100, mode="eval")'
==21232== ...
==21232== HEAP SUMMARY:
==21232== in use at exit: 3,480,447 bytes in 266 blocks
==21232==   total heap usage: 1,010,171 allocs, 1,009,905 frees, 348,600,304 
bytes allocated
==21232== ...


So, apparently, it's not the nodes themselves taking up a disproportionate 
amount of memory -- it's the heap getting so badly fragmented that 89% of its 
memory allocation is wasted.

gprof confirms that there are lots of mallocs/reallocs going on, up to 21 per 
node:

$ gprof python
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 17.82  0.31 0.31  220 0.00 0.00  PyParser_AddToken
 13.79  0.55 0.242 0.12 0.16  freechildren
 12.64  0.77 0.22 21039125 0.00 0.00  _PyMem_RawMalloc
  6.32  0.88 0.11 17000101 0.00 0.00  PyNode_AddChild
  5.75  0.98 0.10 28379846 0.00 0.00  visit_decref
  5.75  1.08 0.10  104 0.00 0.00  ast_for_expr
  4.60  1.16 0.08 2867 0.00 0.00  collect
  4.02  1.23 0.07 20023405 0.00 0.00  _PyObject_Free
  2.30  1.27 0.04  3031305 0.00 0.00  _PyType_Lookup
  2.30  1.31 0.04  3002234 0.00 0.00  
_PyObject_GenericSetAttrWithDict
  2.30  1.35 0.041 0.04 0.05  ast2obj_expr
  1.72  1.38 0.03 28366858 0.00 0.00  visit_reachable
  1.72  1.41 0.03 12000510 0.00 0.00  subtype_traverse
  1.72  1.44 0.03 3644 0.00 0.00  list_traverse
  1.44  1.47 0.03  3002161 0.00 0.00  _PyObjectDict_SetItem
  1.15  1.49 0.02 20022785 0.00 0.00  PyObject_Free
  1.15  1.51 0.02 15000763 0.00 0.00  _PyObject_Realloc


So, I suppose what needs to be done is to try reducing the number of reallocs 
involved in handling an AST node; the representation of the nodes themselves 
doesn't need to change.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I though this might be tokenizer issue, but this is just large memory 
consumption for AST tree. Simpler example:

./python -c 'import ast; ast.parse("0,"*100, mode="eval")'

It takes over 450MB of memory on 32-bit system, over 450 bytes per tuple item. 
Increase the multiplier to 1000 leads to swapping and failing with 
MemoryError.

Of course it would be nice to decrease memory consumption, but this looks 
rather as new feature.

--
assignee: serhiy.storchaka -> 
versions: +Python 3.6 -Python 2.7, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-25 Thread A. Skrobov

A. Skrobov added the comment:

Yes, I understand that this is a matter of memory consumption, which is why I 
submitted this ticket as "resource usage".
What I don't understand is, what could possibly require gigabytes of memory for 
this task?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-25 Thread Eryk Sun

Eryk Sun added the comment:

> My Python is 64-bit, but my computer only has 2GB physical RAM.

That explains why it takes half an hour to crash. It's thrashing on page 
faults. Adding another paging file or increasing the size of your current 
paging file should allow this to finish parsing... eventually in maybe an hour 
or two. 

The design of the parser isn't something I've delved into very much, but 
possibly the dynamic nature of Python prevents optimizing the memory footprint 
here. Or maybe no one has seen the need to optimize parsing containers (dicts, 
sets, lists, tuples) that have constant literals. This is an inefficient way to 
store 35 MiB of data, as opposed to XML, JSON, or a binary format.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-25 Thread A. Skrobov

A. Skrobov added the comment:

My Python is 64-bit, but my computer only has 2GB physical RAM.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-25 Thread Eryk Sun

Eryk Sun added the comment:

I don't think this is Windows related. Are you using 32-bit Python? On Linux, 
if I limit the process address space to 2 gigs, it crashes almost immediately:

$ ulimit -v 200
$ python-dbg -c 'import crash'
Segmentation fault

It runs out of memory while parsing the file:

Program received signal SIGSEGV, Segmentation fault.
0x0048ecb4 in PyObject_Malloc (nbytes=72) at 
../Objects/obmalloc.c:932
932 ../Objects/obmalloc.c: No such file or directory.
(gdb) bt
#0  0x0048ecb4 in PyObject_Malloc (nbytes=72) at 
../Objects/obmalloc.c:932
#1  0x0048f8be in _PyObject_DebugMallocApi (id=111 'o', nbytes=40) 
at ../Objects/obmalloc.c:1469
#2  0x0048fa2b in _PyObject_DebugReallocApi (api=111 'o', p=0x0, 
nbytes=40) at ../Objects/obmalloc.c:1520
#3  0x0048f83c in _PyObject_DebugRealloc (p=0x0, nbytes=40) at 
../Objects/obmalloc.c:1441
#4  0x00418a02 in PyNode_AddChild (n1=0x7fff85cbffb8, type=318, 
str=0x0, lineno=1, col_offset=6446977)
at ../Parser/node.c:98
#5  0x00418f53 in push (s=0xa6b680, type=318, d=0x8bfc70 
, newstate=1, lineno=1, col_offset=6446977)
at ../Parser/parser.c:126
#6  0x0041946c in PyParser_AddToken (ps=0xa6b680, type=262144, 
str=0x7fff85cba720 "11", lineno=1, 
col_offset=6446977, expected_ret=0x7fffd324) at 
../Parser/parser.c:252
#7  0x00419f19 in parsetok (tok=0xa5f650, g=0x8c0ac0 
<_PyParser_Grammar>, start=257, err_ret=0x7fffd300, 
flags=0x7fffd2ec) at ../Parser/parsetok.c:198
#8  0x00419cb6 in PyParser_ParseFileFlagsEx (fp=0xa19b70, 
filename=0xa53b00 "crash.py", 
g=0x8c0ac0 <_PyParser_Grammar>, start=257, ps1=0x0, ps2=0x0, 
err_ret=0x7fffd300, flags=0x7fffd2ec)
at ../Parser/parsetok.c:106

--
components:  -Windows
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-25 Thread A. Skrobov

A. Skrobov added the comment:

Mine is on Windows. I've now installed both 2.7.10 and 3.4.3 to reconfirm, and 
it's still the same, on both of them, except that on 3.4.3 it crashes with a 
MemoryError much faster (within a couple minutes).

--
components: +Windows
nosy: +paul.moore, steve.dower, tim.golden, zach.ware
versions: +Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-25 Thread Christian Heimes

Christian Heimes added the comment:

It takes about 12 seconds to byte compile crash.py on my computer and less than 
half a second to import about 28 MB of byte code:


$ rm -rf crash.pyc __pycache__/
$ time python2.7 -c 'import crash'

real0m11.930s
user0m9.859s
sys 0m2.085s
$ time python2.7 -c 'import crash'

real0m0.484s
user0m0.401s
sys 0m0.083s
$ time python3.4 -c 'import crash'

real0m12.327s
user0m10.106s
sys 0m2.236s
$ time python3.4 -c 'import crash'

real0m0.435s
user0m0.367s
sys 0m0.069s

--
nosy: +christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-24 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-24 Thread A. Skrobov

A. Skrobov added the comment:

A practical note: if, instead of importing crash.py, I do a json.loads, with a 
few extra transformations:

with open("crash.py") as f: holo_table={tuple(int(z) for z in k.split(', ')):v 
for k,v in 
json.loads(f.readlines()[0][13:].replace('(','"').replace(')','"')).iteritems()}

--the whole data structure loads in a jiffy.

Makes me wonder why this roundabout approach is so much more efficient than the 
native parsing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-24 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26415] Out of memory, trying to parse a 35MB dict

2016-02-22 Thread A. Skrobov

New submission from A. Skrobov:

I have a one-line module that assigns a tuple->int dictionary:

holo_table = {(0, 0, 0, 0, 0, 0, 1, 41, 61, 66, 89): 9, (0, 0, 0, 70, 88, 98, 
103, 131, 147, 119, 93): 4, [35MB skipped], (932, 643, 499, 286, 326, 338, 279, 
200, 280, 262, 115): 5}

When I try to import this module, Python grinds 100% of my CPU for like half an 
hour, then ultimately crashes with a MemoryError.

How much memory does it need to parse 35MB of data, of a rather simple 
structure?

Attaching the module, zipped to 10MB.

--
components: Interpreter Core
files: crash.zip
messages: 260704
nosy: A. Skrobov
priority: normal
severity: normal
status: open
title: Out of memory, trying to parse a 35MB dict
type: resource usage
versions: Python 2.7
Added file: http://bugs.python.org/file42011/crash.zip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com