What is the target system/board you are using? I have one system based
on an mBMC that does this. All the other systems I have are rock-solid;
I've left openipmigui up for days. I have an automated system that has
been running for months without problem, and it uses the python interface.
I can think of three possibilities:
1. The program stops polling the BMC for some period of time, and the
BMC drops the connections. This doesn't make sense, though,
because OpenIPMI should reconnect on the next time it tries.
2. You lose network connectivity.
3. The BMC goes "out to lunch" for a while. I've traced my system
that has a similar problem, and it simply stops working for a
while. I find this really only happens when a driver is also
running on the local interface or when the BIOS is using the
interface.
You could try enabling a raw message trace in OpenIPMI or using tcpdump
(obviously with filters on port 623) to watch the messages, but I'm
guessing when this happens you will just see the messages going to the
BMC, but nothing coming back.
-corey
Venelin Petkov wrote:
> Hi,
>
> I am developing several python scripts for monitoring our computing
> nodes via ipmi. I have used the openipmigui scripts as a starting
> point and example. Now that everything is done, a single persistent
> problem remains though: there are frequent connection timeouts. One of
> the scripts is a monitoring daemon that checks on two test nodes
> periodically (60s default). Sometimes the connections to both nodes
> (domains in ipmi jargon) are stable for hours, sometimes both get
> disconnected immediately, and most frequently, one goes out while the
> other remains online. After some time the disconnected node(s)
> usually reconnects automatically (I don't do that in the script). It
> strikes me as a particularly irregular behaviour of the program, so I
> am at a loss explaining it :).
>
> This has been bugging me from the beginning, moreover the
> openipmigui.py script suffers from exactly the same problem. Here is a
> logged message from my program:
>
> 2007-08-15 10:17:11,872 INFO IPMI Monitor started
> 2007-08-15 10:17:42,130 WARNING (Domain.conn_change_cb) domain:
> Node01, err: OS: Connection timed out, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:17:58,878 WARNING (Domain.conn_change_cb) domain:
> Node01, err: OS: Invalid argument, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:18:08,914 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:18:18,922 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:18:28,922 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:18:38,938 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:18:48,939 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:18:58,919 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:19:08,923 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> 2007-08-15 10:19:18,927 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
> ...
> 2007-08-15 10:19:28,923 WARNING (Domain.conn_change_cb) domain:
> Node01, err: IPMI: Timeout:c3, connum: 0, portnum: 0,
> anything_connected: 0
>
> The message is logged by the ( conn_change_cb(self, domain, err,
> connum, portnum, anything_connected) ) event handler, which I have
> defined in my custom Domain class. Can somebody explain what could go
> wrong, especially if you
> had the same problem before?
>
> Any help will be greatly appreciated,
>
> Greetings,
> Venelin Petkov
>
> ----------
> Physics Student
> Helics II hpc linux cluster, Institute for Scientific Computing,
> University of Heidelberg,
> Germany
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer