Hello all, At the Helics II cluster (IWR, Heidelberg, Germany) we are using the OpenIPMI python library to monitor the compute nodes. The master host, on which our python daemon runs, uses Debian Etch with OpenIPMI v. 2.0.14. The daemon runs fine for about a week on average (from 3 up to 10 days) and then it suddenly crashes. This has happened repeatedly for the past half-year or so. Since this is a purely Python daemon, we suspect that there might be a problem with OpenIPMI.
There are never tracebacks and the daemon either crashes, freezes, or segfaults. We are using screen heavily, but this happens even when it is run directly from bash. Most interestingly, this problem persisted for a couple of months, then after an upgrade of the operating system it was running well for a couple of months. However, it started crashing again since the beginning of April. An upgrade to 2.0.16 didn't help. We haven't modified the code of the daemon at least for a year. Does anybody has any idea why this might happen? Thanks in advance. Greetings, Venelin Petkov Student Computer Administrator at IWR, Heidelberg, Germany
------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com
_______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
