On Mon, Sep 18, 2017 at 10:59 AM, Antoine Pitrou <anto...@python.org> wrote: > Le 18/09/2017 à 19:53, Nathaniel Smith a écrit : >>> >>>> Why are reference cycles a problem that needs solving? >>> >>> Because sometimes they are holding up costly resources in memory when >>> people don't expect them to. Such as large Numpy arrays :-) >> >> Do we have any reason to believe that this is actually happening on a >> regular basis though? > > Define "regular" :-) We did get some reports on dask/distributed about it.
Caused by uncollected cycles involving tracebacks? I looked here: https://github.com/dask/distributed/issues?utf8=%E2%9C%93&q=is%3Aissue%20memory%20leak and saw some issues with cycles causing delayed collection (e.g. #956) or the classic memory leak problem of explicitly holding onto data you don't need any more (e.g. #1209, bpo-29861), but nothing involving traceback cycles. It was just a quick skim though. >> If it is then it might make sense to look at the cycle collection >> heuristics; IIRC they're based on a fairly naive count of how many >> allocations have been made, without regard to their size. > > Yes... But just because a lot of memory has been allocated isn't a good > enough heuristic to launch a GC collection. I'm not an expert on GC at all, but intuitively it sure seems like allocation size might be a useful piece of information to feed into a heuristic. Our current heuristic is just, run a small collection after every 700 allocations, run a larger collection after 10 smaller collections. > What if that memory is > gonna stay allocated for a long time? Then you're frequently launching > GC runs for no tangible result except more CPU consumption and frequent > pauses. Every heuristic has problematic cases, that's why we call it a heuristic :-). But somehow every other GC language manages to do well-enough without refcounting... I think they mostly have more sophisticated heuristics than CPython, though. Off the top of my head, I know PyPy's heuristic involves the ratio of the size of nursery objects versus the size of the heap, and JVMs do much cleverer things like auto-tuning nursery size to make empirical pause times match some target. > Perhaps we could special-case tracebacks somehow, flag when a traceback > remains alive after the implicit "del" clause at the end of an "except" > block, then maintain some kind of linked list of the flagged tracebacks > and launch specialized GC runs to find cycles accross that collection. > That sounds quite involved, though. We already keep a list of recently allocated objects and have a specialized GC that runs across just that collection. That's what generational GC is :-). -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com