On Jan 24, 2017 3:35 AM, "Thomas Wouters" <tho...@python.org> wrote:
On Fri, Jan 20, 2017 at 1:40 PM, Christian Heimes <christ...@python.org> wrote: > On 2017-01-20 13:15, INADA Naoki wrote: > >> > >> "this script counts static memory usage. It doesn’t care about dynamic > >> memory usage of processing real request" > >> > >> You may be trying to optimize something which is only a very small > >> fraction of your actual memory footprint. That said, the marshal > >> module could certainly try to intern some tuples and other immutable > >> structures. > >> > > > > Yes. I hadn't think static memory footprint is so important. > > > > But Instagram tried to increase CoW efficiency of prefork application, > > and got some success about memory usage and CPU throughput. > > I surprised about it because prefork only shares static memory footprint. > > > > Maybe, sharing some tuples which code object has may increase cache > efficiency. > > I'll try run pyperformance with the marshal patch. > > IIRC Thomas Wouters (?) has been working on a patch to move the ref > counter out of the PyObject struct and into a dedicated memory area. He > proposed the idea to improve cache affinity, reduce cache evictions and > to make CoW more efficient. Especially modern ccNUMA machines with > multiple processors could benefit from the improvement, but also single > processor/multi core machines. > FWIW, I have a working patch for that (against trunk a few months back, even though the original idea was for the gilectomy branch), moving just the refcount and not PyGC_HEAD. Performance-wise, in the benchmarks it's a small but consistent loss (2-5% on a noisy machine, as measured by python-benchmarks, not perf), and it breaks the ABI as well as any code that dereferences PyObject.ob_refcnt directly (the field was repurposed and renamed, and exposed as a const* to avoid direct assignment). It also exposes the API awkwardness that CPython doesn't *require* objects to go through a specific mechanism for object initialisation, even though nearly all extension modules do so. (That same API awkwardness made life a little harder when experimenting with BDW GC :P.) I don't believe external refcounts can be made the default without careful redesigning of a new set of PyObject API calls and deprecation of the old ones. The thing I found most surprising about that blog post was that contrary to common wisdom, refcnt updates per se had essentially no effect on the amount of memory shared between CoW processes, and the problems were all due to the cycle collector. (Though I guess it's still possible that part of the problems caused by the cycle collector are due to it touching ob_refcnt.) It's promising too though, because the GC metadata is much less exposed to extension modules than PyObject_HEAD is, and the access patterns are presumably (?) much more bursty. It'd be really interesting to see how things performed if packing just PyGC_HEAD but *not* ob_refcnt into a dedicated region. -n
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com