Hi,
I am developing several python scripts for monitoring our computing nodes
via ipmi. I have used the openipmigui scripts as a starting point and
example. Now that everything is done, a single persistent problem remains
though: there are frequent connection timeouts. One of the scripts is a
monitoring daemon that checks on two test nodes periodically (60s default).
Sometimes the connections to both nodes (domains in ipmi jargon) are stable
for hours, sometimes both get disconnected immediately, and most frequently,
one goes out while the other remains online. After some time the
disconnected node(s) usually reconnects automatically (I don't do that in
the script). It strikes me as a particularly irregular behaviour of the
program, so I am at a loss explaining it :).
This has been bugging me from the beginning, moreover the
openipmigui.pyscript suffers from exactly the same problem. Here is a
logged message from
my program:
2007-08-15 10:17:11,872 INFO IPMI Monitor started
2007-08-15 10:17:42,130 WARNING (Domain.conn_change_cb) domain: Node01, err:
OS: Connection timed out, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:17:58,878 WARNING (Domain.conn_change_cb) domain: Node01, err:
OS: Invalid argument, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:18:08,914 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:18:18,922 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:18:28,922 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:18:38,938 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:18:48,939 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:18:58,919 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:19:08,923 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
2007-08-15 10:19:18,927 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
...
2007-08-15 10:19:28,923 WARNING (Domain.conn_change_cb) domain: Node01, err:
IPMI: Timeout:c3, connum: 0, portnum: 0, anything_connected: 0
The message is logged by the ( conn_change_cb(self, domain, err, connum,
portnum, anything_connected) ) event handler, which I have defined in my
custom Domain class. Can somebody explain what could go wrong, especially
if you
had the same problem before?
Any help will be greatly appreciated,
Greetings,
Venelin Petkov
----------
Physics Student
Helics II hpc linux cluster, Institute for Scientific Computing,
University of Heidelberg,
Germany
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer