2017-06-01 9:38 GMT+02:00 Larry Hastings <la...@hastings.org>: > When CPython's small block allocator was first merged in late February 2001, > it allocated memory in gigantic chunks it called "arenas". These arenas > were a massive 256 KILOBYTES apiece.
The arena size defines the strict minimum memory usage of Python. With 256 kB, it means that the smallest memory usage is 256 kB. > What would be the result of making the arena size 4mb? A minimum memory usage of 4 MB. It also means that if you allocate 4 MB + 1 byte, Python will eat 8 MB from the operating system. The GNU libc malloc uses a variable threshold to choose between sbrk() (heap memory) or mmap(). It starts at 128 kB or 256 kB, and then is adapted depending on the workload (I don't know how exactly). I would prefer to have an adaptative arena size. For example start at 256 kB and then double the arena size while the memory usage grows. The problem is that pymalloc is currently designed for a fixed arena size. I have no idea how hard it would be to make the size per allocated arena. I already read that CPU support "large pages" between 2 MB and 1 GB, instead of just 4 kB. Using large pages can have a significant impact on performance. I don't know if we can do something to help the Linux kernel to use large pages for our memory? I don't know neither how we could do that :-) Maybe using mmap() closer to large pages will help Linux to join them to build a big page? (Linux has something magic to make applications use big pages transparently.) More generally: I'm strongly in favor of making our memory allocator more efficient :-D When I wrote my tracemalloc PEP 454, I counted that Python calls malloc() , realloc() or free() 270,000 times per second in average when running the Python test suite: https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator (now I don't recall if it was really "malloc" or PyObject_Malloc, but well, we do a lot of memory allocations and deallocations ;-)) When I analyzed the timeline of CPython master performance, I was surprised to see that my change on PyMem_Malloc() to make it use pymalloc was one of the most significant "optimization" of the Python 3.6! http://pyperformance.readthedocs.io/cpython_results_2017.html#pymalloc-allocator The CPython performance heavily depends on the performance of our memory allocator, at least of the performance of pymalloc (the specialized allocator for blocks <= 512 bytes). By the way, Naoki INADA also proposed a different idea: "Global freepool: Many types has it’s own freepool. Sharing freepool can increase memory and cache efficiency. Add PyMem_FastFree(void* ptr, size_t size) to store memory block to freepool, and PyMem_Malloc can check global freepool first." http://faster-cpython.readthedocs.io/cpython37.html IMHO It's also worth it to investigate this change as well. Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com