[Phillip J. Eby] > I'm trying to figure out how to implement this now, and running into > a bit of a snag. It's easy enough for gcmodule.c to check if an > object is a generator, but I'm not sure how safe the dynamic check > actually is, since it depends on the generator's state. In > principle, running other finalizers could cause the generator's state > to change from a finalizer being required to not being required, or > vice versa. Could this mess up the GC process?
Yup, although the tricky question is whether it's possible for other finalizers to do such a thing. > It seems to me that it's safe for a generator to say, "yes, I need > finalization", > because if it later turns out not to, it's just a waste. Definitely safe. In effect, that's what happens right now (all generators say "I need finalization" now). > But if the generator says, "no, I don't need finalization", and then later > turns out to need it, doesn't that leave an opportunity to screw things up > if the GC does anything other than immediately clear the generator? > > As best I can tell, the only things that could cause arbitrary Python > code to run, that could possibly result in generator state changes, > are structure traversal and weakref callback handling. And __del__ methods. It's a common misconception that Python's cyclic gc won't ever clean up an object with a __del__ method. It can and routinely does. What it won't do is automagically break a _cycle_ containing an object with a __del__ method. It's quite possible to have any number of objects with __del__ methods reachable only _from_ a trash cycle containing no objects with __del__ methods, where those __del__-slinging objects are not themselves in a cycle. gc will break that cycle, and the __del__ methods on trash objects "hanging off" that cycle will get invoked as a normal side effect of their objects' refcounts falling to 0. >From a different POV, Python's gc never reclaims anything directly -- all it does is break cycles via calling tp_clear on trash objects, and whatever (if any) reclamation gets done happens as a side effect of Py_DECREF. Specifically, this one in delete_garbage(): if ((clear = op->ob_type->tp_clear) != NULL) { Py_INCREF(op); clear(op); Py_DECREF(op); } If it wouldn't spoil the fun, I'd be tempted to add a comment pointing out that the entire purpose of gcmodule.c is to execute that Py_DECREF safely :-) > It's probably not sane for anybody to have structure traversal run arbitrary > Python > code, I'm not sure what you mean by "structure traversal". The only kind of traversing that should be going on during gc is running tp_traverse slots, and although I doubt it's written down anywhere, a tp_traverse slot shouldn't even do an incref, let alone call back into Python. A tp_traverse slot dare not release the GIL either. That last one is a subtlety that takes fixing a few critical bugs to fully appreciate: as soon as anything can call Python code, all bets are off, because any number of other threads can run then too, and do _almost_ anything whatsoever to the object graph. In particular, that Py_DECREF() above can trigger a chain of code that releases the GIL, so by the time we get to that loop it has to be impossible for any conceivable Python code to create any new problems for gc. > so I'm going to ignore that for the sake of my own sanity. :) Weakref > callbacks > are tougher, though; it seems possible that you could have one of those cause > a > generator to be advanced to a point where it now needs finalization. Not a _trash_ generator, though. While much of gc's behavior wrt weakref callbacks is more-than-less arbitrary, and so may change some day, for now a wr callback to a trash object is suppressed by gc if any trash objects are reachable from that callback. > OTOH, if such a generator could be advanced by the callback, then > wouldn't that mean the generator is reachable, Yes, but you have to qualify "reachable" to "reachable from the callback". > and ergo, not garbage? If the callback is itself trash, no, then G being reachable from the callback is not enough evidence to conclude that G is not garbage. The horrid bugs we've had come from things "just like that": messy interconnections among objects that _all_ look like trash. When they're in cycles, they can reach each other, and so their finalizers can see each other too, trash or not. We already endure lots of pain to ensure that a weakref callback that gets executed (not all do) can't see anything that looks like trash. > That is, since only reachable weakref callbacks are run, s/reachable/non-trash/ and that's true today. > they must by definition be unable to access any generator that > declared itself finalizer-free. Any trash object, period. > It does seem possible you could end up with a situation where an > object with a finalizer is called after a generator it references is > torn down, but that circumstance can occur in earlier versions of > Python anyway, and in fact this behavior would be consistent. That shouldn't be possible. Because the only reclamation done by gc is via Py_DECREF side effects, objects not in cycles are torn down in a topological-sort order of the "points-to" relation. If A points to B (B is directly reachable from A), and in the absence of cycles, and with everything driven by Py_DECREF, B's refcount can't fall to 0 before A's does. Therefore B is wholly intact when A's finalizer (if any) is invoked. That's a great "hidden" benefit of refcount-driven reclamation. If A and B are in a cycle, and A has a finalizer, then gc refuses to call tp_clear on either of them, and neither refcount falls to 0, so A's finalizer doesn't run at all. > Okay, I *think* I've convinced myself that a dynamic state check is > OK, but I'm hoping somebody with more GC experience can check my > reasoning here for holes. Let's take a peek at __del__ methods: C1 <-> C2 -> D -> G -> A C1 and C2 don't have finalizers and are in a cycle. D has a __del__ method. G is a generator that says "I don't need finalization". Suppose they're all trash, and these are all the objects that exist. gc moves D and everything transitively reachable from D to a special `finalizers` list. Only C1 and C2 are in the list of things gc will invoke tp_clear on. Say it does C1.tp_clear() first. That does Py_DECREF(C2) as a side effect. That in turn does Py_DECREF(D) as a side effect, and D.__del__() is invoked. Since G is reachable from D, _del__ may change G to a state where finalization is needed. But it doesn't seem to matter, since G and A weren't in the tp_clear candidate list to begin with. More generally, gc will not invoke tp_clear on anything transitively reachable from any object with a __del__ method. So if a generator is reachable from a __del__, gc won't invoke tp_clear on anything reachable from the generator. If the generator gets cleaned up at all, it's via "ordinary" Py_DECREF side effects, so nothing reachable from the generator will vanish either before the generator goes away. If the generator decides it needs to finalize after all, doesn't seem like it matters. Unless ... C1 <-> C2 -> D -> G <-> A [G and A are also in a cycle now] Now D.__del__ may or may not advance G to a "needs finalization" state, but D decref'ing G no longer drops G's refcount to 0 regardless. This _round_ of gc won't break the G<->A cycle regardless (since everything reachable from D is exempt from tp_clear). The G<->A cycle will end up in an older generation. Some number of gc rounds later (when the older generation containing G<->A gets collected), the cycle will be broken if G doesn't say it needs finalization, or G will be moved to gc.garbage if G says it does need finalization. In either case, D is long gone so can't make more trouble. No real harm there either -- although it may be suprising that G<->A collection (when possible) gets delayed, that's always been true of trash cycles hanging off a trash object with a __del__ method hanging off a reclaimable trash cycle. What is new is that G won't wind up in gc.garbage during the first round of gc if D.__del__() pushes G to a "needs finalization" state. Looks safe to me ;-) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com