On Sep 14, 2005, at 4:36 PM, Hal Rosenstock wrote:

Hi Brett,

On Wed, 2005-09-14 at 15:13, Brett Bode wrote:
     I have found out a bit more information. I think you are correct
that the switch was getting messed up. I had tried resetting the switch
with the old opensm code we had been running and found that fixed
things up until the bad node was plugged in. We had not reset the
switch since upgrading the opensm code. Upon doing that all seems to
work again. Opensm throws some error below due to the bad node, but it
appears to continue to correctly configure the remaining network.

and the switch continues to work ? (That's with the new (1.1.0) OpenSM,
right ?

Yes

 So I
am currently thinking the latest opensm more or less correctly deals
with the failed node. I also suspect the older opensm not only handled
the error badly but somehow caused the switch to get into a confused
state that the new opensm couldn't fix without a reset.

Here is the repeated errors thrown:

Right, that looks similar to yesterday's log except that the DR is a
little different. Did the misbehaving HCA node get plugged into a
different switch port perhaps ?

That is possible.

______________________________________________________________________
Here is the output of the other commands you suggested with everything
working:

I'm not sure which HCA port the SM ran on but...

The multicast tree appears only set up on the one switch. Were the other
nodes off the other switch not involved ?

Also, port 8 off the switch appears not in the multicast tree although I
see it in the topology file. Not sure why that would be.

I think we only have the IPOIB modules loaded on the systems on the one switch. The system connected to port 8 also does not have the IP module loaded. Originally we did not have the two switches linked together, but it we had a system on the second switch that had more up to date software so we loaded the new opensm onto it and connected the switches together. We are just getting the stuff on the second switch installed and are still waiting on some parts as well...

Thanks,
Brett

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to