Re: Self healthcheck
On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote: I assume you're talking about pure Python code, running under CPython. (If you're writing an extension module, say in C, there are completely different ways to detect reference leaks; and other Pythons will behave slightly differently.) There's no way to detect truly unreferenced objects, because they simply won't exist - not after a garbage collection run, and usually sooner than that. But if you want to find objects that you're somehow not using and yet still have live references to, you'll need to define using in a way that makes sense. Generally there aren't many ways that that can happen, so those few places are candidates for a weak reference system (maybe you map a name to the master object representing that thing, and you can recreate the master object from the disk, so when nothing else is referring to it, you can happily flush it out - that mapping is a good candidate for weak references). But for most programs, don't bother. CPython is pretty good at keeping track of its own references, so chances are you don't need to - and if you're seeing the process's memory usage going up, it's entirely possible you can neither detect nor correct the problem in Python code (eg heap fragmentation). ChrisA Hi Chris Yes the question was about CPython. But i am not after CPython leaks though detecting these would be good, but my own mistakes leading to accumulation of data in mutable structures. there will be few processes running python code standalone communicating across servers and every activity will be spread over time so i have to persistently keep record of activity and remove it later when activity is finished. In addition to checking objects directly i would like to analyze also app health indirectly via checking amount of data it holds. let say there is permanently 100 activities per second and typical object count figure is 1000 (in abstract units averaged over long enough time window), so i would check throughput and memory to see if my program is healthy in terms of leaking resources and generate log if it is not. Input to such module will be traffic events (whatever event significant to object creation). So i am looking for proper way to detect memory held by CPython app. And it would be good if memory can be deduced down to object/class name so blamed one could be identified and reported. Thanks Asaf -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
On Wednesday, 22 January 2014, Asaf Las roeg...@gmail.com wrote: On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote: I assume you're talking about pure Python code, running under CPython. (If you're writing an extension module, say in C, there are completely different ways to detect reference leaks; and other Pythons will behave slightly differently.) There's no way to detect truly unreferenced objects, because they simply won't exist - not after a garbage collection run, and usually sooner than that. But if you want to find objects that you're somehow not using and yet still have live references to, you'll need to define using in a way that makes sense. Generally there aren't many ways that that can happen, so those few places are candidates for a weak reference system (maybe you map a name to the master object representing that thing, and you can recreate the master object from the disk, so when nothing else is referring to it, you can happily flush it out - that mapping is a good candidate for weak references). But for most programs, don't bother. CPython is pretty good at keeping track of its own references, so chances are you don't need to - and if you're seeing the process's memory usage going up, it's entirely possible you can neither detect nor correct the problem in Python code (eg heap fragmentation). ChrisA Hi Chris Yes the question was about CPython. But i am not after CPython leaks though detecting these would be good, but my own mistakes leading to accumulation of data in mutable structures. there will be few processes running python code standalone communicating across servers and every activity will be spread over time so i have to persistently keep record of activity and remove it later when activity is finished. In addition to checking objects directly i would like to analyze also app health indirectly via checking amount of data it holds. let say there is permanently 100 activities per second and typical object count figure is 1000 (in abstract units averaged over long enough time window), so i would check throughput and memory to see if my program is healthy in terms of leaking resources and generate log if it is not. Input to such module will be traffic events (whatever event significant to object creation). So i am looking for proper way to detect memory held by CPython app. And it would be good if memory can be deduced down to object/class name so blamed one could be identified and reported. There are some good tools recommended here: http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended But in general: use weak references wherever possible would be my advice. They not only prevent cycles but will highlight the kinds of bug in your code that is likely to cause the sort of problem you are worried about. -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
Asaf Las roeg...@gmail.com wrote in message news:58c541ab-c6e1-45a8-b03a-8597ed7ec...@googlegroups.com... Yes the question was about CPython. But i am not after CPython leaks though detecting these would be good, but my own mistakes leading to accumulation of data in mutable structures. there will be few processes running python code standalone communicating across servers and every activity will be spread over time so i have to persistently keep record of activity and remove it later when activity is finished. I had a similar concern. My main worry, which turned out to be well-founded, was that I would create an object as a result of some user input, but when the user had finished with it, and in theory it could be garbage-collected, in practice it would not be due to some obscure circular reference somewhere. For short-running tasks this is not a cause for concern, but for a long-running server these can build up over time and end up causing a problem. My solution was to log every time an object was created, with some self-identifying piece of information, and then log when it was deleted, with the same identifier. After running the program for a while I could then analyse the log and ensure that each creation had a corresponding deletion. The tricky bit was logging the deletion. It is a known gotcha in Python that you cannot rely on the __del__ method, and indeed it can cause a circular reference in itself which prevents the object from being garbage-collected. I found a solution somewhere which explained the use of a 'delwatcher' class. This is how it works - class MainObject: def __init__(self, identifier): self._del = delwatcher('MainObject', identifier) class delwatcher: def __init__(self, obj_type, identifier): self.obj_type = obj_type self.identifier = identifier log('{}: id={} created'.format(self.obj_type, self.identifier)) def __del__(self): log('{}: id={} deleted'.format(self.obj_type, self.identifier)) In this case calling __del__() is safe, as no reference to the main object is held. If you do find that an object is not being deleted, it is then trial-and-error to find the problem and fix it. It is probably a circular reference HTH Frank Millman -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
On Wednesday, January 22, 2014 10:43:39 AM UTC+2, Nicholas wrote: There are some good tools recommended here: http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended But in general: use weak references wherever possible would be my advice. They not only prevent cycles but will highlight the kinds of bug in your code that is likely to cause the sort of problem you are worried about. Thanks! i will look into these! -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote: class MainObject: def __init__(self, identifier): self._del = delwatcher('MainObject', identifier) class delwatcher: def __init__(self, obj_type, identifier): self.obj_type = obj_type self.identifier = identifier log('{}: id={} created'.format(self.obj_type, self.identifier)) def __del__(self): log('{}: id={} deleted'.format(self.obj_type, self.identifier)) If you do find that an object is not being deleted, it is then trial-and-error to find the problem and fix it. It is probably a circular reference Frank Millman Thanks Frank. Good approach! One question - You could do: class MainObject: def __init__(self, identifier): self._del = delwatcher(self) then later class delwatcher: def __init__(self, tobject): self.obj_type = type(tobject) self.identifier = id(tobject) ... when creating delwatcher. Was there special reason to not to use them? is this because of memory is reused when objects are deleted and created again so same reference could be for objects created in different time slots? Thanks Asaf -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
Asaf Las roeg...@gmail.com Wrote in message: On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote: class MainObject: def __init__(self, identifier): self._del = delwatcher('MainObject', identifier) class delwatcher: def __init__(self, obj_type, identifier): self.obj_type = obj_type self.identifier = identifier log('{}: id={} created'.format(self.obj_type, self.identifier)) def __del__(self): log('{}: id={} deleted'.format(self.obj_type, self.identifier)) If you do find that an object is not being deleted, it is then trial-and-error to find the problem and fix it. It is probably a circular reference Frank Millman Thanks Frank. Good approach! One question - You could do: class MainObject: def __init__(self, identifier): self._del = delwatcher(self) then later class delwatcher: def __init__(self, tobject): self.obj_type = type(tobject) self.identifier = id(tobject) ... when creating delwatcher. Was there special reason to not to use them? is this because of memory is reused when objects are deleted and created again so same reference could be for objects created in different time slots? I couldn't make sense of most of that. But an ID only uniquely corresponds to an object while that object still exists. The system may, and will, reuse iD's constantly. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
Asaf Las roeg...@gmail.com wrote in message news:9729ddaa-5976-4e53-8584-6198b47b6...@googlegroups.com... On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote: class MainObject: def __init__(self, identifier): self._del = delwatcher('MainObject', identifier) class delwatcher: def __init__(self, obj_type, identifier): self.obj_type = obj_type self.identifier = identifier log('{}: id={} created'.format(self.obj_type, self.identifier)) def __del__(self): log('{}: id={} deleted'.format(self.obj_type, self.identifier)) If you do find that an object is not being deleted, it is then trial-and-error to find the problem and fix it. It is probably a circular reference Frank Millman Thanks Frank. Good approach! One question - You could do: class MainObject: def __init__(self, identifier): self._del = delwatcher(self) then later class delwatcher: def __init__(self, tobject): self.obj_type = type(tobject) self.identifier = id(tobject) ... when creating delwatcher. Was there special reason to not to use them? is this because of memory is reused when objects are deleted and created again so same reference could be for objects created in different time slots? I read Dave's reply, and he is correct in saying that id's are frequently re-used in python. However, in this particular case, I think you are right, it is safe to use the id to identify the object. An id can only be re-used if the original object is deleted, and that is the whole point of this exercise. We expect to see the id come up in a 'created' message, and then the same id appear in a 'deleted' message. If this happens, we are not concerned if the same id reappears in a subsequent 'created' message. Frank -- https://mail.python.org/mailman/listinfo/python-list
Self healthcheck
Hi When designing long running background process is it feasible to monitor object/memory leakage due to improper programming? If it could be possible to make module which monitor and record trends if alive objects then event can be generated and logged if noof zombie objects are to increase in longer run. Would the gc.count() serve for such purpose? Thanks Asaf -- https://mail.python.org/mailman/listinfo/python-list
Re: Self healthcheck
On Wed, Jan 22, 2014 at 1:51 PM, Asaf Las roeg...@gmail.com wrote: When designing long running background process is it feasible to monitor object/memory leakage due to improper programming? I assume you're talking about pure Python code, running under CPython. (If you're writing an extension module, say in C, there are completely different ways to detect reference leaks; and other Pythons will behave slightly differently.) There's no way to detect truly unreferenced objects, because they simply won't exist - not after a garbage collection run, and usually sooner than that. But if you want to find objects that you're somehow not using and yet still have live references to, you'll need to define using in a way that makes sense. Generally there aren't many ways that that can happen, so those few places are candidates for a weak reference system (maybe you map a name to the master object representing that thing, and you can recreate the master object from the disk, so when nothing else is referring to it, you can happily flush it out - that mapping is a good candidate for weak references). But for most programs, don't bother. CPython is pretty good at keeping track of its own references, so chances are you don't need to - and if you're seeing the process's memory usage going up, it's entirely possible you can neither detect nor correct the problem in Python code (eg heap fragmentation). ChrisA -- https://mail.python.org/mailman/listinfo/python-list