On Sat, 15 Jun 2019 19:56:58 -0500 Tim Peters <tim.pet...@gmail.com> wrote: > > At the start, obmalloc never returned arenas to the system. The vast > majority of users were fine with that. A relative few weren't. Evan > Jones wrote all the (considerable!) code to change that, and I > massaged it and checked it in - not because there was "scientific > proof" that it was more beneficial than harmful (it certainly added > new expenses!) overall, but because it seemed like a right thing to > do, _anticipating_ that the issue would become more important in > coming years. > > I'm still glad it was done, but no tests were checked in to _quantify_ > its presumed benefits - or even to verify that it ever returned arenas > to the system. Best I can tell, nobody actually has any informed idea > how well it does. Evan stared at programs that were important to him, > and fiddled things until he was "happy enough".
We moved from malloc() to mmap() for allocating arenas because of user requests to release memory more deterministically: https://bugs.python.org/issue11849 And given the number of people who use Python for long-running processes nowadays, I'm sure that they would notice (and be annoyed) if Python did not release memory after memory consumption spikes. > I've looked at obmalloc stats in other programs at various stages, and > saw nothing concerning. memchunk.py appears to model object lifetimes > as coming from a uniform distribution, but in real life they appear to > be strongly multi-modal (with high peaks at the "way less than an eye > blink" and "effectively immortal" ends). I agree they will certainly be multi-modal, with the number of modes, their respective weights and their temporal distance widely dependent on use cases. (the fact that they're multi-modal is the reason why generational GC is useful, btw) > We haven't been especially > pro-active about giant machines, and are suffering from it: > > https://metarabbit.wordpress.com/2018/02/05/pythons-weak-performance-matters/ So you're definitely trying to solve a problem, right? > Fixing the underlying cause put giant machines on my radar, and > getting rid of obmalloc's pool size limit was the next obvious thing > that would help them (although not in the same universe as cutting > quadratic time to linear). "Not in the same universe", indeed. So the question becomes: does the improvement increasing the pool and arena size have a negative outcome on *other* use cases? Not everyone has giant machines. Actually a frequent usage model is to have many small VMs or containers on a medium-size machine. > For example, > it has to allocate at least 56 bytes of separate bookkeeping info for > each arena. Nobody cares when they have 100 arenas, but when there > are a million arenas (which I've seen), that adds up. In relative terms, assuming that arenas are 50% full on average (probably a pessimistic assumption?), that overhead is 0.08% per arena memory used. What point is worrying about that? > > If the problem is the cost of mmap() and munmap() calls, then the > > solution more or less exists at the system level: jemalloc and other > > allocators use madvise() with MADV_FREE (see here: > > https://lwn.net/Articles/593564/). > > > > A possible design is a free list of arenas on which you use MADV_FREE > > to let the system know the memory *can* be reclaimed. When the free > > list overflows, call munmap() on extraneous arenas. > > People can certainly pursue that if they like. I'm not interested in > adding more complication that helps only one of obmalloc's slowest > paths on only one platform. MADV_FREE is available on multiple platforms (at least Linux, macOS, FreeBSD). Windows seems to offer similar facilities: https://devblogs.microsoft.com/oldnewthing/20170113-00/?p=95185 > The dead obvious, dead simple, way to reduce mmap() expense is to call > it less often, which just requires changing a compile-time constant - > which will also call VirtualAlloc() equally less often on Windows. That's assuming the dominating term in mmap() cost is O(1) rather than O(size). That's not a given. The system call cost is certainly O(1), but the cost of reserving and mapping HW pages, and zeroing them out is most certainly O(# pages). Regards Antoine. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ER62CFARDT2RYUNW7WOP3L5JV6UVSHSI/