Venelin Petkov wrote:
> Hello all,
>
>   At the Helics II cluster (IWR, Heidelberg, Germany) we are using the 
> OpenIPMI python library to monitor the compute nodes. The master host, 
> on which
> our python daemon runs, uses Debian Etch with OpenIPMI v. 2.0.14. The 
> daemon runs fine for about a week on average (from 3 up to 10 days) 
> and then it
>  suddenly  crashes. This has happened repeatedly for the past 
> half-year or so. Since this is a purely Python daemon, we suspect that 
> there might be a problem
> with OpenIPMI.
>
>   There are never tracebacks and the daemon either crashes, freezes, 
> or segfaults. We are using screen heavily, but this happens even when 
> it is run
> directly from bash. Most interestingly, this problem persisted for a 
> couple of months, then after an upgrade of the operating system it was 
> running well
> for a couple of months. However, it started crashing again since the 
> beginning of April. An upgrade to 2.0.16 didn't help. We haven't 
> modified the code of
> the daemon at least for a year.
>
>   Does anybody has any idea why this might happen? Thanks in advance.
I haven't seen this, but I don't run anything for that period of time.  
But there may very well be a problem in OpenIPMI.  A couple of things to 
try...

Can you check the memory utilization of python/OpenIPMI?  Maybe there's 
a leak someplace.  If so, it may be possible to use the memory debugger 
in OpenIPMI to track it down.  Since python uses a refcounting garbage 
collector, it's possible to leave a circular memory reference around so 
it would never be reclaimed; this is a common problem in long-running 
python code.

Is it possible to get a coredump and a traceback?  You can do this by 
setting "ulimit -c unlimited" before running the process, then you can 
do "gdb /usr/bin/python <corefile>" and then "backtrace" to find out 
where the crash occurred.  You may have to install debugging libraries 
for this to work, but that can be done after the fact.

Are there any logs that come out of OpenIPMI before this happens?

-corey

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to