On Tue, Jul 9, 2019 at 9:46 AM Tim Peters <tim.pet...@gmail.com> wrote: > > > At last, all size classes has1~3 used/cached memory blocks. > > No doubt part of it, but hard to believe it's most of it. If the loop > count above really is 10240, then there's only about 80K worth of > pointers in the final `buf`.
You are right. List.append is not the major part of memory consumer of "large" class (8KiB+1 ~ 512KiB). They are several causes of large size alloc: * bm_logging uses StringIO.seek(0); StringIO.truncate() to reset buffer. So internal buffer of StringIO become Py_UCS4 array instead of a list of strings from the 2nd loop. This buffer is using same policy to list for increase capacity. `size + size >> 8 + (size < 9 ? 3 : 6)`. Actually, when I use `-n 1` option, memory usage is only 9MiB. * The intern dict. * Many modules are loaded, and FileIO.readall() is used to read pyc files. This creates and deletes various size of bytes objects. * logging module uses several regular expressions. `b'\0' * 0xff00` is used in sre_compile. https://github.com/python/cpython/blob/master/Lib/sre_compile.py#L320 > > But does it really matter? ;-) mimalloc "should have" done MADV_FREE > on the pages holding the older `buf` instances, so it's not like the > app is demanding to hold on to the RAM (albeit that it may well show > up in the app's RSS unless/until the OS takes the RAM away). > mimalloc doesn't call madvice for each free(). Each size classes keeps a 64KiB "page". And several pages (4KiB) in the "page" are committed but not used. I dumped all "mimalloc page" stat. https://paper.dropbox.com/doc/mimalloc-on-CPython--Agg3g6XhoX77KLLmN43V48cfAg-fFyIm8P9aJpymKQN0scpp#:uid=671467140288877659659079&h2=memory-usage-of-logging_format For example: bin block_size used capacity reserved 29 2560 1 22 25 (14 pages are committed, 2560 bytes are in use) 29 2560 14 25 25 (16 pages are committed, 2560*14 bytes are in use) 29 2560 11 25 25 31 3584 1 5 18 (5 pages are committed, 3584 bytes are in use) 33 5120 1 4 12 33 5120 2 12 12 33 5120 2 12 12 37 10240 3 11 409 41 20480 1 6 204 57 327680 1 2 12 * committed pages can be calculated by `ceil(block_size * capacity / 4096)` roughly. There are dozen of unused memory block and committed pages in each size classes. This caused 10MiB+ memory usage overhead on logging_format and logging_simple benchmarks. >> I was more intrigued by your first (speed) comparison: > > > - spectral_norm: 202 ms +- 5 ms -> 176 ms +- 3 ms: 1.15x faster (-13%) > > Now _that's_ interesting ;-) Looks like spectral_norm recycles many > short-lived Python floats at a swift pace. So memory management > should account for a large part of its runtime (the arithmetic it does > is cheap in comparison), and obmalloc and mimalloc should both excel > at recycling mountains of small objects. Why is mimalloc > significantly faster? [snip] > obmalloc's `address_in_range()` is definitely a major overhead in its > fastest `free()` path, but then mimalloc has to figure out which > thread is doing the freeing (looks cheaper than address_in_range, but > not free). Perhaps the layers of indirection that have been wrapped > around obmalloc over the years are to blame? Perhaps mimalloc's > larger (16x) pools and arenas let it stay in its fastest paths more > often? I don't know why, but it would be interesting to find out :-) Totally agree. I'll investigate this next. Regards, -- Inada Naoki <songofaca...@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MXEE2NOEDAP72RFVTC7H4GJSE2CHP3SX/