On Fri, Jun 03, 2005 at 08:37:04AM -0400, Hal Rosenstock wrote: > On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote: > > I'm having intermittent problems with opensm.. It seems after a while > > IPoIB stops working > > Wonder if there is some relation to the two: intermittent IPoIB and lack > of response to SM query. > > > and if I restart opensm, > > How did you get around the ABI version mismatch issue ? > > > it starts spitting out > > errors. Do I have a misbehaving switch somewhere? > > It appears that a node is not responding to a discovery packet (SM Get > NodeInfo (attrID 0x11)). It's direct route initial path (an array of > port numbers at the start of the next hop) is: > Initial path = [1][81][1] which means that starting at the node running > OpenSM, port 1 then port 129 then port 1. Is there a large switch in the > middle ? Can you send the output of ibnetdiscover ? If that is valid, > which HCA (port) is not responding (what is the GUID) ? > > Unfortunately on such an error osm does not appear to give up (it > retries forever and is locked on such a node). This is obviously not > good. > > > ibnetdiscover seems to work fine. > > Are you sure it displays all HCA and switches and their ports ? I > wouldn't think it would respond to ibnetdiscover if it didn't respond to > osm.
I'm running a subversion checkout as of yesterday, so that's how I got around the ABI version stuff. the [81] port indicator is definitely bogus. All I have are 8 port switches. I've also seen [0][0][0] path indicators.. are those allowed as well? _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
