Re: garbage collection / cyclic references
On Mar 21, 11:59 am, andrew cooke and...@acooke.org wrote: Aaron Brady wrote: My point is, that garbage collection is able to detect when there are no program-reachable references to an object. Why not notify the programmer (the programmer's objects) when that happens? If the object does still have other unreachable references, s/he should be informed of that too. i think we're mixing python-specific and more general / java details, but, as far as i understand things, state of the art (and particularly generational) garbage collectors don't guarantee that objects will ever be reclaimed. this is a trade for efficiency, and it's a trade that seems to be worthwhile and popular. It's at best worthless, but so is politics. I take it back; you can reclaim memory in large numbers with a probabilistic finalizer. The expected value of reclaiming a KB with a 90% chance of call is .9 KB. The allocation structure I am writing will have a very long up-time. I can forcibly reclaim the memory of an object involved in a cycle, but lingering references it has will never be detected. Though, if I can't guarantee 100% reclamation, I'll have to be anticipating a buffer dump eventually anyway, which makes, does it not, 90% about the same as 99%. furthermore, you're mixing responsibilities for two logically separate ideas just because a particular implementation happens to associate them, which is not a good idea from a design pov. I think a silent omission of finalization is the only alternative. If so they're mixed, one way or the other. I argue it is closer to associating your class with a hash table: they are logically separate ideas. Perhaps implementation is logically separate from design altogether. i can remember, way back in the mists of time I understand they were having a fog problem there yesterday... not to mention a sale on sand. Today: Scattered showers and thunderstorms before 1pm, then a slight chance of showers. using java finalizers for doing this kind of thing. and then learning that it was a bad idea. once i got over the initial frustration, it really hasn't been a problem. i haven't met a situation I don't suppose I imagine one. So, you could argue that it's a low priority. Washing your hands of the rare, though, disqualifies you from the associate's in philosophy. I bet you want to meet my customers, too. where i needed to tie resource management and memory management together (except for interfacing with c code that does not use the host language's gc - and i can imagine that for python this is a very strong (perhaps *the*) argument for reference counting). I'm using a specialized mapping type to implement the back end of user- defined classes. Since I know the implementation, which in particular maps strings to objects, I can always just break cycles by hand; that is, until someone wants a C extension. Then they will want tp_clear and tp_traverse methods. as an bonus example, consider object caching - a very common technique that immediately breaks anything that associates other resources with memory use. I assume your other processes are notified of the cache state. From what I understand, Windows supports /named/ caching. Collaborators can check the named cache, in the process creating it if it doesn't exist, and read and write at will there. just because, in one limited case, you can do something, doesn't mean it's a good idea. andrew But you're right. I haven't talked this over much on the outside, so I might be missing something huge, and serialized two-step finalization (tm) is the secret least of my worries. -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
andrew cooke and...@acooke.org writes: the two dominant virtual machines - .net and the jvm both handle circular references with no problem whatever. AFAIK, they also don't guarantee that finalizers ever run, much less run in deterministic order. -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
On Mar 20, 8:12 pm, andrew cooke and...@acooke.org wrote: Aaron Brady wrote: [...] caveats and fragilities? If free software can do it, why isn't it all over the industry? What disqualifies it from solved-problem status? the two dominant virtual machines - .net and the jvm both handle circular references with no problem whatever. this is standard in modern garbage collection - go read a book on the subject (personally i like grune et al's modern compiler design). it *is* a solved problem. if anything, python is behind the curve, not ahead of it, but this may change with the next generation of python implementations (pypy targets a variety of vms, i think). as for the extra methods you suggest - why do you want to expose implementation details in an api? that is not the normal aim of good design. andrew Circular references ...can only be cleaned up if there are no Python- level __del__() methods involved. __del__ doc. Python doesn’t collect ... cycles automatically because, in general, it isn’t possible for Python to guess a safe order in which to run the __del__() methods. gc.garbage doc. Errors should never pass silently. -The Zen of Python I advance that cyclic objects should be notified when their external references go to zero, but their usual '__del__' is inappropriate. If objects implement a __del__ method, they can choose to implement a '__gc_cycle__' method, and then just drop the specified attribute. It needn't be called on every object in the cycle, either; once it's called on one object, another object's normal __del__ may be safely called. Output, unproduced: del x In X.__gc_cycle__, 'other' attribute. Deleting... In Y.__del__. In X.__del__. The actual backend of CPython requires garbage-collected container types to implement tp_inquiry and tp_clear methods, but user-defined types apparently aren't required to conform. Supporting Cyclic Garbage Collection http://docs.python.org/3.0/c-api/gcsupport.html -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
The actual backend of CPython requires garbage-collected container types to implement tp_inquiry and tp_clear methods, but user-defined types apparently aren't required to conform. tp_inquiry doesn't exist, you probably mean tp_traverse. tp_traverse is completely irrelevant for python-defined types; the VM can traverse a user-defined type just fine even without the help of tp_traverse. If a C-defined type fails to implement tp_traverse when it should, then garbage collection breaks entirely. tp_clear isn't invoked for an object at all if the object is in a cycle with finalizers, so it's not something that you can use to detect that you are in a cycle with finalizers. Cycles with finalizers are considered a bug; application programmers should check gc.garbage at the end of the program to determine whether they have this bug. There is an easy design pattern around it, so I'm -1 on complicating the GC protocol. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
Paul Rubin wrote: andrew cooke and...@acooke.org writes: the two dominant virtual machines - .net and the jvm both handle circular references with no problem whatever. AFAIK, they also don't guarantee that finalizers ever run, much less run in deterministic order. i think you're right, but i'm missing your point - perhaps there was some sub-context to the original post that i didn't understand? finalizers should not be considered part of a public resource management api - they should not be used to do things like flushing and closing files, for example. i think this was a minor issue early in java's adoption (i guess because of incorrect assumptions made by c++ programmers) (in python the with context is a much better mechanism for this kind of thing - the best java has is the finally statement). but it's one of those things that (afaik) isn't an issue once you fully embrace the language (rather like, say, semantically meaningful indentation). but i'm sure you know all that, so i'm still wondering what i've missed. andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
On Mar 21, 7:54 am, andrew cooke and...@acooke.org wrote: Paul Rubin wrote: andrew cooke and...@acooke.org writes: the two dominant virtual machines - .net and the jvm both handle circular references with no problem whatever. AFAIK, they also don't guarantee that finalizers ever run, much less run in deterministic order. i think you're right, but i'm missing your point - perhaps there was some sub-context to the original post that i didn't understand? finalizers should not be considered part of a public resource management api - they should not be used to do things like flushing and closing files, for example. i think this was a minor issue early in java's adoption (i guess because of incorrect assumptions made by c++ programmers) (in python the with context is a much better mechanism for this kind of thing - the best java has is the finally statement). but it's one of those things that (afaik) isn't an issue once you fully embrace the language (rather like, say, semantically meaningful indentation). but i'm sure you know all that, so i'm still wondering what i've missed. andrew Theoretically, my object should be able to maintain an open resource for its lifetime; and its clients shouldn't need to know what its lifetime is. Therefore, it needs a callback when that is over. If finalization methods could be called in a structurally sound manner, they could be relied on to handle flushing and closing files. they should not be used to do things like flushing and closing files, for example. What is your basis for this claim, if it's not the mere unreliability of finalization? IOW, are you not merely begging the question? -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
Aaron Brady wrote: On Mar 21, 7:54 am, andrew cooke and...@acooke.org wrote: they should not be used to do things like flushing and closing files, for example. What is your basis for this claim, if it's not the mere unreliability of finalization? IOW, are you not merely begging the question? I'm not sure it's clear, but I was talking about Java. As Paul implied, a consequence of completely automated garbage management is that it is (from a programmer's POV) deterministic. So it's a programming error to rely on the finalizer to free resources that don't follow that model (ie any resource that's anything other that reasonable amounts of memory). That's pretty much an unavoidable consequence of fully automated garbage collection. You can pretend it's not, and try using finalizers for other work if you want. That's fine - it's your code, not mine. I'm just explaining how life is. Andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
andrew cooke wrote: Aaron Brady wrote: On Mar 21, 7:54 am, andrew cooke and...@acooke.org wrote: they should not be used to do things like flushing and closing files, for example. What is your basis for this claim, if it's not the mere unreliability of finalization? IOW, are you not merely begging the question? I'm not sure it's clear, but I was talking about Java. crap. i meant to say INdeterministic. sorry, i am in a foul mood (for completely unrelated reasons) and probably shouldn't be making posts to a public newsgroup. andrew As Paul implied, a consequence of completely automated garbage management is that it is (from a programmer's POV) deterministic. So it's a programming error to rely on the finalizer to free resources that don't follow that model (ie any resource that's anything other that reasonable amounts of memory). That's pretty much an unavoidable consequence of fully automated garbage collection. You can pretend it's not, and try using finalizers for other work if you want. That's fine - it's your code, not mine. I'm just explaining how life is. Andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
On Mar 21, 9:50 am, andrew cooke and...@acooke.org wrote: Aaron Brady wrote: On Mar 21, 7:54 am, andrew cooke and...@acooke.org wrote: they should not be used to do things like flushing and closing files, for example. What is your basis for this claim, if it's not the mere unreliability of finalization? IOW, are you not merely begging the question? I'm not sure it's clear, but I was talking about Java. As Paul implied, a consequence of completely automated garbage management is that it is (from a programmer's POV) deterministic. So it's a [indeterministic] programming error to rely on the finalizer to free resources that don't follow that model (ie any resource that's anything other that [than] reasonable amounts of memory). That's pretty much an unavoidable consequence of fully automated garbage collection. You can pretend it's not, and try using finalizers for other work if you want. That's fine - it's your code, not mine. I'm just explaining how life is. Andrew Hi, nice to talk to you this early. Sorry you're in a bad mood. You've sure come to the right place to find friends though. /tongue in cheek My point is, that garbage collection is able to detect when there are no program-reachable references to an object. Why not notify the programmer (the programmer's objects) when that happens? If the object does still have other unreachable references, s/he should be informed of that too. I advanced an additional method to this end. Do you argue that there aren't any cases in which the class could make use of the information; or there aren't /enough/ cases so in which? Perhaps it would help to handle a contrary case by hand. Two objects need to make write operations each to the other when they are closed. Would it be sufficient in general, knowing nothing further about them, to queue some information, and close? Do they always know at design- time their references will be cyclic? Would a mere '__leave_reachability__' method be more generally informative or robust? Would it constitute a two-step destruction, to notify objects when they're unreachable, and then finalize? The two objects' write operations could execute in such a method, without risking prior destruction. -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
On Mar 21, 10:28 am, Aaron Brady castiro...@gmail.com wrote: On Mar 21, 9:50 am, andrew cooke and...@acooke.org wrote: Aaron Brady wrote: On Mar 21, 7:54 am, andrew cooke and...@acooke.org wrote: they should not be used to do things like flushing and closing files, for example. What is your basis for this claim, if it's not the mere unreliability of finalization? IOW, are you not merely begging the question? I'm not sure it's clear, but I was talking about Java. As Paul implied, a consequence of completely automated garbage management is that it is (from a programmer's POV) deterministic. So it's a [indeterministic] programming error to rely on the finalizer to free resources that don't follow that model (ie any resource that's anything other that [than] reasonable amounts of memory). That's pretty much an unavoidable consequence of fully automated garbage collection. You can pretend it's not, and try using finalizers for other work if you want. That's fine - it's your code, not mine. I'm just explaining how life is. Andrew My point is, that garbage collection is able to detect when there are no program-reachable references to an object. Why not notify the programmer (the programmer's objects) when that happens? If the object does still have other unreachable references, s/he should be informed of that too. snip I took the liberty of composing a sample cyclic reference detector. I will post the class definition later on in the discussion (when and if). The 'run' method resets the globals to a sample graph, as illustrated. 'p' and 's' start out with one simulated program-visible reference each. As you see, the details are already long and boring (yum). I added comments post-facto. run() #only decref 'p' p: (q), q: (pr), r: (q), s: (p) p.decref() #not safe to delete {p,2: 1, q,2: 0, r,1: 0} run() #decref 'p' then 's' p: (q), q: (pr), r: (q), s: (p) p.decref() {p,2: 1, q,2: 0, r,1: 0} s.decref() {s,0: 0, p,2: 0, r,1: 0, q,2: 0} s,0 ALL zero #'s' safe to delete {p,1: 0, q,2: 0, r,1: 0} p,1 ALL zero #also deletes 'p', also safe finalizing s,0 run() p: (q), q: (pr), r: (q), s: (p) s.decref() {s,0: 0, p,3: 1, r,1: 0, q,2: 0} {p,2: 1, q,2: 0, r,1: 0} finalizing s,0 #deletion p.decref() {p,1: 0, q,2: 0, r,1: 0} p,1 ALL zero #'p' safe to delete run() p: (q), q: (pr), r: (q), s: (p) s.decref() {s,0: 0, p,3: 1, q,2: 0, r,1: 0} {p,2: 1, q,2: 0, r,1: 0} finalizing s,0 #'p' not safe, reference still visible We notice the duplicate 'all zero' indicator on run #2. The cycle detector ran on 's.decref', then 's' called 'p.decref', then the cycle detector ran on that. 'q' and 'r' are safe to delete on runs 2 and 3. Here is the implementation of 'final': def final( self ): for _, v in self.__dict__.items( ): if not isinstance( v, G ): continue v.decref( ) print( 'finalizing', self ) The object should be asked to finish its references (cyclic only?), but remain alive. The programmer should see that the state is consistent. Later, its __del__ will be called. We can decide that '__leave_reachability__' will be called without nesting; and/or that '__del__' will be called without nesting, by breaking finalization in to two steps. FTR, this makes __leave_reachability__ about the equivalent of tp_clear, since tp_traverse is prior defined for user-defined types. -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
Aaron Brady wrote: My point is, that garbage collection is able to detect when there are no program-reachable references to an object. Why not notify the programmer (the programmer's objects) when that happens? If the object does still have other unreachable references, s/he should be informed of that too. i think we're mixing python-specific and more general / java details, but, as far as i understand things, state of the art (and particularly generational) garbage collectors don't guarantee that objects will ever be reclaimed. this is a trade for efficiency, and it's a trade that seems to be worthwhile and popular. furthermore, you're mixing responsibilities for two logically separate ideas just because a particular implementation happens to associate them, which is not a good idea from a design pov. i can remember, way back in the mists of time, using java finalizers for doing this kind of thing. and then learning that it was a bad idea. once i got over the initial frustration, it really hasn't been a problem. i haven't met a situation where i needed to tie resource management and memory management together (except for interfacing with c code that does not use the host language's gc - and i can imagine that for python this is a very strong (perhaps *the*) argument for reference counting). as an bonus example, consider object caching - a very common technique that immediately breaks anything that associates other resources with memory use. just because, in one limited case, you can do something, doesn't mean it's a good idea. andrew -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
Aaron Brady wrote: Hello, I was reading and Googling about garbage collection, reference counting, and the problem of cyclic references. Python's garbage collection module claims to be able to detect and break cyclic garbage. Some other languages merely prohibit it. Is this the place to ask about its technique? I understand that the disadvantage is a non-deterministic order of deletion/finalization. Garbage collection and destructors or finalizers don't play well together. It's a fundamental problem. Calling finalizers from the garbage collector is painful. It introduces concurrency where the user may not have expected it. Consider what happens if a finalizer tries to lock something. What if GC runs while that lock is locked? This can create a deadlock situation. Calling finalizers from the garbage collector can result in intermittent, very hard to find bugs. C++ takes destructors seriously; objects are supposed to be destructed exactly once, and if they're of auto scope (a local object in the Python sense) they will reliably be cleaned up at block exit. Microsoft's Managed C++ broke those rules; in Managed C++, destructors can be called more than once. (Look up re-animation in Microsoft Managed C++ literature. It's not pretty.) Python actually has reference counting backed up by garbage collection. Most objects are destroyed as soon as all references to them disappear. Garbage collection is only needed to deal with cycles. Python has weak references, which won't keep an object around once all the regular references are deleted. These are useful in some situations. In a tree, for example, pointers towards the leaves should be strong pointers, while back-pointers towards the root should be weak pointers. I once modified BeautifulSoup, the HTML parser, to use weak pointers that way. BeautifulSoup trees are big and don't go away immediately when no longer needed, because they have backpointers. They hang around until the next GC cycle. With the version that uses weak pointers, they go away as soon as they're no longer needed. We've found this useful in a web crawler; the data space used drops and actual GC runs are no longer necessary. Personally, I'd argue that the right answer is to prohibit cycles of strong pointers. That should be considered a programming error, and detected at run time, at least by debugging tools. With weak pointers, you don't really need cycles of strong pointers. The advantage of this is a clean order of destruction. This is useful in window widget systems, where you have objects with pointers going in many directions, yet object destruction has substantial side effects. Python originally had only reference counting, and didn't have weak pointers. If weak pointers had gone in before the garbage collector, Python might have gone in this direction. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
On Mar 21, 1:04 pm, John Nagle na...@animats.com wrote: Aaron Brady wrote: Hello, I was reading and Googling about garbage collection, reference counting, and the problem of cyclic references. Python's garbage collection module claims to be able to detect and break cyclic garbage. Some other languages merely prohibit it. Is this the place to ask about its technique? I understand that the disadvantage is a non-deterministic order of deletion/finalization. Garbage collection and destructors or finalizers don't play well together. It's a fundamental problem. Calling finalizers from the garbage collector is painful. It introduces concurrency where the user may not have expected it. Consider what happens if a finalizer tries to lock something. What if GC runs while that lock is locked? This can create a deadlock situation. Calling finalizers from the garbage collector can result in intermittent, very hard to find bugs. As I understand it, 'obj.decref()' can call 'other.decref()', which can try to access its reference to 'obj', which has already begun cleanup. At that point, 'obj' is in an inconsistent state. Its own methods are secretly called during its '__del__'. One step would be to serialize this process, so that 'other.decref()' gets deferred until 'obj.decref()' has completed. Then attributes are in an all-or-nothing state, which is at least consistent. However, that means every external reference in a '__del__' method has to be wrapped in an exception handler, one at a time, because the object / might/ already be in a reference cycle. (Non-container types are excepted.) The remaining reference merely needs to change its class to a ReclaimedObject type. That's acceptable if documented. I also believe it solves the potential for deadlock. (Look up re-animation in Microsoft Managed C++ literature. It's not pretty.) Pass! Python actually has reference counting backed up by garbage collection. Most objects are destroyed as soon as all references to them disappear. Garbage collection is only needed to deal with cycles. Python has weak references, which won't keep an object around once all the regular references are deleted. These are useful in some situations. In a tree, for example, pointers towards the leaves should be strong pointers, while back-pointers towards the root should be weak pointers. snip Personally, I'd argue that the right answer is to prohibit cycles of strong pointers. That should be considered a programming error, and detected at run time, at least by debugging tools. With weak pointers, you don't really need cycles of strong pointers. Reference cycles can be detected anyway with debug tools, even prior to destruction. My objection is that would complicate control flow severely: for x in y: z.append( x ) becomes: for x in y: if cyclic_ref( x ): z.append( weakref.ref( x ) ) else: z.append( x ) And worse, every attribute access has to be wrapped. for x in z: if isinstance( x, __builtins__.weakref ): if x() is not None: print( x() ) else: print( x ) In other words, it interferes with uniform access to attributes and container members. However, in the case where you know a structure a priori, it's a good technique, as your example showed. I observe that my proposal has the same weakness! If you make the case that you usually do know the structure your data have, I won't be able to disprove it. The example would come from a peer-to-peer representation of something, or storage of relational data. Regardless, the group has responded to most of my original post. I don't think I emphasized however that I'm designing an allocation system that can contain reference cycles; and I was asking if such special methods, '__gc_cycle__( self, attr )' or '__gc_clear__ ( self )' would be right for me. I'm also interested in feedback about the serialization method of ref. counting earlier in this post. The advantage of this is a clean order of destruction. This is useful in window widget systems, where you have objects with pointers going in many directions, yet object destruction has substantial side effects. Python originally had only reference counting, and didn't have weak pointers. If weak pointers had gone in before the garbage collector, Python might have gone in this direction. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: garbage collection / cyclic references
Aaron Brady wrote: [...] caveats and fragilities? If free software can do it, why isn't it all over the industry? What disqualifies it from solved-problem status? the two dominant virtual machines - .net and the jvm both handle circular references with no problem whatever. this is standard in modern garbage collection - go read a book on the subject (personally i like grune et al's modern compiler design). it *is* a solved problem. if anything, python is behind the curve, not ahead of it, but this may change with the next generation of python implementations (pypy targets a variety of vms, i think). as for the extra methods you suggest - why do you want to expose implementation details in an api? that is not the normal aim of good design. andrew -- http://mail.python.org/mailman/listinfo/python-list