On Thu, 2005-06-02 at 19:23, Troy Benjegerdes wrote: > I'm having intermittent problems with opensm.. It seems after a while > IPoIB stops working
Wonder if there is some relation to the two: intermittent IPoIB and lack of response to SM query. > and if I restart opensm, How did you get around the ABI version mismatch issue ? > it starts spitting out > errors. Do I have a misbehaving switch somewhere? It appears that a node is not responding to a discovery packet (SM Get NodeInfo (attrID 0x11)). It's direct route initial path (an array of port numbers at the start of the next hop) is: Initial path = [1][81][1] which means that starting at the node running OpenSM, port 1 then port 129 then port 1. Is there a large switch in the middle ? Can you send the output of ibnetdiscover ? If that is valid, which HCA (port) is not responding (what is the GUID) ? Unfortunately on such an error osm does not appear to give up (it retries forever and is locked on such a node). This is obviously not good. > ibnetdiscover seems to work fine. Are you sure it displays all HCA and switches and their ports ? I wouldn't think it would respond to ibnetdiscover if it didn't respond to osm. -- Hal > (this is from running 'opensm -v -o -r') _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
