On Mon, 18 Sep 2017 13:25:47 -0700 Nathaniel Smith <n...@pobox.com> wrote: > >> If it is then it might make sense to look at the cycle collection > >> heuristics; IIRC they're based on a fairly naive count of how many > >> allocations have been made, without regard to their size. > > > > Yes... But just because a lot of memory has been allocated isn't a good > > enough heuristic to launch a GC collection. > > I'm not an expert on GC at all, but intuitively it sure seems like > allocation size might be a useful piece of information to feed into a > heuristic. Our current heuristic is just, run a small collection after > every 700 allocations, run a larger collection after 10 smaller > collections.
Ok, so there are two useful things a heuristic may try to guess: 1. how much CPU time a (full) collection will cost 2. how much memory it might help release While #1 is quite realistically predictable (because a full collection is basically O(number of allocated objects)), #2 is entirely unpredictable and you can really only make out an upper bound (trivially, the GC cannot release any more memory than has been allocated :-)). Our current heuristic is geared at estimating #1, so that the proportional time taken by GC runs in any long-running program is roughly bounded. > Every heuristic has problematic cases, that's why we call it a > heuristic :-). But somehow every other GC language manages to do > well-enough without refcounting... I don't know, define "well enough"? :-) For example, Java is well-known for requiring manual tuning of GC parameters for heavy workloads (or at least it was). I would be curious how well Java deals when someone does numerical crunching on large chunks of data, then throws that data away, for example. Apparently R also uses a garbage collector, and ironically you can find places on the Internet where people recommend you call gc() to avoid thrashing memory. https://stackoverflow.com/questions/8813753/what-is-the-difference-between-gc-and-rm On CPython at least, once you lose the last reference to your multi-gigabyte Numpy array, memory is returned immediately... > I think they mostly have more > sophisticated heuristics than CPython, though. I'm sure some of them do. They also tend to have many more man-hours available than we do to optimize the hell out of their GC. > > Perhaps we could special-case tracebacks somehow, flag when a traceback > > remains alive after the implicit "del" clause at the end of an "except" > > block, then maintain some kind of linked list of the flagged tracebacks > > and launch specialized GC runs to find cycles accross that collection. > > That sounds quite involved, though. > > We already keep a list of recently allocated objects and have a > specialized GC that runs across just that collection. That's what > generational GC is :-). Well, you can cross your fingers that the traceback is still in the youngest generations when it finally becomes collectable. But if the traceback stays alive for too long, it ends up in the oldest generation and a long time can pass before it gets collected. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com