Venelin Petkov wrote: > Hello all, > > At the Helics II cluster (IWR, Heidelberg, Germany) we are using the > OpenIPMI python library to monitor the compute nodes. The master host, > on which > our python daemon runs, uses Debian Etch with OpenIPMI v. 2.0.14. The > daemon runs fine for about a week on average (from 3 up to 10 days) > and then it > suddenly crashes. This has happened repeatedly for the past > half-year or so. Since this is a purely Python daemon, we suspect that > there might be a problem > with OpenIPMI. > > There are never tracebacks and the daemon either crashes, freezes, > or segfaults. We are using screen heavily, but this happens even when > it is run > directly from bash. Most interestingly, this problem persisted for a > couple of months, then after an upgrade of the operating system it was > running well > for a couple of months. However, it started crashing again since the > beginning of April. An upgrade to 2.0.16 didn't help. We haven't > modified the code of > the daemon at least for a year. > > Does anybody has any idea why this might happen? Thanks in advance. I haven't seen this, but I don't run anything for that period of time. But there may very well be a problem in OpenIPMI. A couple of things to try...
Can you check the memory utilization of python/OpenIPMI? Maybe there's a leak someplace. If so, it may be possible to use the memory debugger in OpenIPMI to track it down. Since python uses a refcounting garbage collector, it's possible to leave a circular memory reference around so it would never be reclaimed; this is a common problem in long-running python code. Is it possible to get a coredump and a traceback? You can do this by setting "ulimit -c unlimited" before running the process, then you can do "gdb /usr/bin/python <corefile>" and then "backtrace" to find out where the crash occurred. You may have to install debugging libraries for this to work, but that can be done after the fact. Are there any logs that come out of OpenIPMI before this happens? -corey ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
