Re: Self healthcheck

2014-01-22 Thread Asaf Las
On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote:
 I assume you're talking about pure Python code, running under CPython.
 (If you're writing an extension module, say in C, there are completely
 different ways to detect reference leaks; and other Pythons will
 behave slightly differently.) There's no way to detect truly
 unreferenced objects, because they simply won't exist - not after a
 garbage collection run, and usually sooner than that. But if you want
 to find objects that you're somehow not using and yet still have live
 references to, you'll need to define using in a way that makes
 sense. Generally there aren't many ways that that can happen, so those
 few places are candidates for a weak reference system (maybe you map a
 name to the master object representing that thing, and you can
 recreate the master object from the disk, so when nothing else is
 referring to it, you can happily flush it out - that mapping is a good
 candidate for weak references).
 
 But for most programs, don't bother. CPython is pretty good at keeping
 track of its own references, so chances are you don't need to - and if
 you're seeing the process's memory usage going up, it's entirely
 possible you can neither detect nor correct the problem in Python code
 (eg heap fragmentation).
 ChrisA

Hi Chris

Yes the question was about CPython. But i am not after CPython leaks
though detecting these would be good, but my own mistakes leading to 
accumulation of data in mutable structures.
there will be few processes running python code standalone communicating
across servers and every activity will be spread over time so 
i have to persistently keep record of activity and remove it later when
activity is finished. In addition to checking objects directly i would 
like to analyze also app health indirectly via checking amount of data 
it holds. let say there is permanently 100 activities per second and 
typical object count figure is 1000 (in abstract units averaged over long 
enough time window), so i would check throughput and memory to see if my 
program is healthy in terms of leaking resources and generate log if it 
is not.
Input to such module will be traffic events (whatever event significant 
to object creation). 
So i am looking for proper way to detect memory held by CPython app. And 
it would be good if memory can be deduced down to object/class name so 
blamed one could be identified and reported.

Thanks 

Asaf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-22 Thread Nicholas Cole
On Wednesday, 22 January 2014, Asaf Las roeg...@gmail.com wrote:

 On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote:
  I assume you're talking about pure Python code, running under CPython.
  (If you're writing an extension module, say in C, there are completely
  different ways to detect reference leaks; and other Pythons will
  behave slightly differently.) There's no way to detect truly
  unreferenced objects, because they simply won't exist - not after a
  garbage collection run, and usually sooner than that. But if you want
  to find objects that you're somehow not using and yet still have live
  references to, you'll need to define using in a way that makes
  sense. Generally there aren't many ways that that can happen, so those
  few places are candidates for a weak reference system (maybe you map a
  name to the master object representing that thing, and you can
  recreate the master object from the disk, so when nothing else is
  referring to it, you can happily flush it out - that mapping is a good
  candidate for weak references).
 
  But for most programs, don't bother. CPython is pretty good at keeping
  track of its own references, so chances are you don't need to - and if
  you're seeing the process's memory usage going up, it's entirely
  possible you can neither detect nor correct the problem in Python code
  (eg heap fragmentation).
  ChrisA

 Hi Chris

 Yes the question was about CPython. But i am not after CPython leaks
 though detecting these would be good, but my own mistakes leading to
 accumulation of data in mutable structures.
 there will be few processes running python code standalone communicating
 across servers and every activity will be spread over time so
 i have to persistently keep record of activity and remove it later when
 activity is finished. In addition to checking objects directly i would
 like to analyze also app health indirectly via checking amount of data
 it holds. let say there is permanently 100 activities per second and
 typical object count figure is 1000 (in abstract units averaged over long
 enough time window), so i would check throughput and memory to see if my
 program is healthy in terms of leaking resources and generate log if it
 is not.
 Input to such module will be traffic events (whatever event significant
 to object creation).
 So i am looking for proper way to detect memory held by CPython app. And
 it would be good if memory can be deduced down to object/class name so
 blamed one could be identified and reported.


There are some good tools recommended here:

http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended

 But in general: use weak references wherever possible would be my advice.
They not only prevent cycles but will highlight the kinds of bug in your
code that is likely to cause the sort of problem you are worried about.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-22 Thread Frank Millman

Asaf Las roeg...@gmail.com wrote in message 
news:58c541ab-c6e1-45a8-b03a-8597ed7ec...@googlegroups.com...

 Yes the question was about CPython. But i am not after CPython leaks
 though detecting these would be good, but my own mistakes leading to
 accumulation of data in mutable structures.
 there will be few processes running python code standalone communicating
 across servers and every activity will be spread over time so
 i have to persistently keep record of activity and remove it later when
 activity is finished.

I had a similar concern. My main worry, which turned out to be well-founded, 
was that I would create an object as a result of some user input, but when 
the user had finished with it, and in theory it could be garbage-collected, 
in practice it would not be due to some obscure circular reference 
somewhere.

For short-running tasks this is not a cause for concern, but for a 
long-running server these can build up over time and end up causing a 
problem.

My solution was to log every time an object was created, with some 
self-identifying piece of information, and then log when it was deleted, 
with the same identifier. After running the program for a while I could then 
analyse the log and ensure that each creation had a corresponding deletion.

The tricky bit was logging the deletion. It is a known gotcha in Python that 
you cannot rely on the __del__ method, and indeed it can cause a circular 
reference in itself which prevents the object from being garbage-collected. 
I found a solution somewhere which explained the use of a 'delwatcher' 
class. This is how it works -

class MainObject:
def __init__(self, identifier):
self._del = delwatcher('MainObject', identifier)

class delwatcher:
def __init__(self, obj_type, identifier):
self.obj_type = obj_type
self.identifier = identifier
log('{}: id={} created'.format(self.obj_type, self.identifier))
def __del__(self):
log('{}: id={} deleted'.format(self.obj_type, self.identifier))

In this case calling __del__() is safe, as no reference to the main object 
is held.

If you do find that an object is not being deleted, it is then 
trial-and-error to find the problem and fix it. It is probably a circular 
reference

HTH

Frank Millman



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-22 Thread Asaf Las
On Wednesday, January 22, 2014 10:43:39 AM UTC+2, Nicholas wrote:
 There are some good tools recommended here: 
 http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended
  But in general: use weak references wherever possible would be 
 my advice. They not only prevent cycles but will highlight the 
 kinds of bug in your code that is likely to cause the sort of 
 problem you are worried about.

Thanks! i will look into these!

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-22 Thread Asaf Las
On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
 
 class MainObject:
 def __init__(self, identifier):
  self._del = delwatcher('MainObject', identifier)
 class delwatcher:
 def __init__(self, obj_type, identifier):
 self.obj_type = obj_type
 self.identifier = identifier
 log('{}: id={} created'.format(self.obj_type, self.identifier))
 def __del__(self):
 log('{}: id={} deleted'.format(self.obj_type, self.identifier))
 If you do find that an object is not being deleted, it is then 
 trial-and-error to find the problem and fix it. It is probably a circular 
 reference
 
 Frank Millman

Thanks Frank. Good approach! 

One question - You could do:
class MainObject:
def __init__(self, identifier):
 self._del = delwatcher(self)
then later 

class delwatcher:
def __init__(self, tobject):
self.obj_type = type(tobject)
self.identifier = id(tobject)
...

when creating delwatcher. Was there special reason to not to use them?
is this because of memory is reused when objects are deleted 
and created again so same reference could be for objects created 
in different time slots?

Thanks 

Asaf

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-22 Thread Dave Angel
 Asaf Las roeg...@gmail.com Wrote in message:
 On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
 
 class MainObject:
 def __init__(self, identifier):
  self._del = delwatcher('MainObject', identifier)
 class delwatcher:
 def __init__(self, obj_type, identifier):
 self.obj_type = obj_type
 self.identifier = identifier
 log('{}: id={} created'.format(self.obj_type, self.identifier))
 def __del__(self):
 log('{}: id={} deleted'.format(self.obj_type, self.identifier))
 If you do find that an object is not being deleted, it is then 
 trial-and-error to find the problem and fix it. It is probably a circular 
 reference
 
 Frank Millman
 
 Thanks Frank. Good approach! 
 
 One question - You could do:
 class MainObject:
 def __init__(self, identifier):
  self._del = delwatcher(self)
 then later 
 
 class delwatcher:
 def __init__(self, tobject):
 self.obj_type = type(tobject)
 self.identifier = id(tobject)
 ...
 
 when creating delwatcher. Was there special reason to not to use them?
 is this because of memory is reused when objects are deleted 
 and created again so same reference could be for objects created 
 in different time slots?
 

I couldn't make sense of most of that.  But an ID only uniquely
 corresponds to an object while that object still exists.  The
 system may,  and will, reuse iD's constantly. 

-- 
DaveA

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-22 Thread Frank Millman

Asaf Las roeg...@gmail.com wrote in message 
news:9729ddaa-5976-4e53-8584-6198b47b6...@googlegroups.com...
 On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:

 class MainObject:
 def __init__(self, identifier):
  self._del = delwatcher('MainObject', identifier)
 class delwatcher:
 def __init__(self, obj_type, identifier):
 self.obj_type = obj_type
 self.identifier = identifier
 log('{}: id={} created'.format(self.obj_type, self.identifier))
 def __del__(self):
 log('{}: id={} deleted'.format(self.obj_type, self.identifier))
 If you do find that an object is not being deleted, it is then
 trial-and-error to find the problem and fix it. It is probably a circular
 reference

 Frank Millman

 Thanks Frank. Good approach!

 One question - You could do:
 class MainObject:
def __init__(self, identifier):
 self._del = delwatcher(self)
 then later

 class delwatcher:
def __init__(self, tobject):
self.obj_type = type(tobject)
self.identifier = id(tobject)
...

 when creating delwatcher. Was there special reason to not to use them?
 is this because of memory is reused when objects are deleted
 and created again so same reference could be for objects created
 in different time slots?


I read Dave's reply, and he is correct in saying that id's are frequently 
re-used in python.

However, in this particular case, I think you are right, it is safe to use 
the id to identify the object. An id can only be re-used if the original 
object is deleted, and that is the whole point of this exercise. We expect 
to see the id come up in a 'created' message, and then the same id appear in 
a 'deleted' message. If this happens, we are not concerned if the same id 
reappears in a subsequent 'created' message.

Frank



-- 
https://mail.python.org/mailman/listinfo/python-list


Self healthcheck

2014-01-21 Thread Asaf Las
Hi 

When designing long running background process 
is it feasible to monitor object/memory leakage due 
to improper programming?
If it could be possible to make module which monitor and 
record trends if alive objects then event can be 
generated and logged if noof zombie objects 
are to increase in longer run.

Would the gc.count() serve for such purpose?

Thanks

Asaf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Self healthcheck

2014-01-21 Thread Chris Angelico
On Wed, Jan 22, 2014 at 1:51 PM, Asaf Las roeg...@gmail.com wrote:
 When designing long running background process
 is it feasible to monitor object/memory leakage due
 to improper programming?

I assume you're talking about pure Python code, running under CPython.
(If you're writing an extension module, say in C, there are completely
different ways to detect reference leaks; and other Pythons will
behave slightly differently.) There's no way to detect truly
unreferenced objects, because they simply won't exist - not after a
garbage collection run, and usually sooner than that. But if you want
to find objects that you're somehow not using and yet still have live
references to, you'll need to define using in a way that makes
sense. Generally there aren't many ways that that can happen, so those
few places are candidates for a weak reference system (maybe you map a
name to the master object representing that thing, and you can
recreate the master object from the disk, so when nothing else is
referring to it, you can happily flush it out - that mapping is a good
candidate for weak references).

But for most programs, don't bother. CPython is pretty good at keeping
track of its own references, so chances are you don't need to - and if
you're seeing the process's memory usage going up, it's entirely
possible you can neither detect nor correct the problem in Python code
(eg heap fragmentation).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list