On Fri, 2005-06-03 at 12:47, Eitan Zahavi wrote: > Hi, > Sorry for catching up with this late in the thread. (Thanks Hal for > waking me up...) > > > > It appears that a node is not responding to a discovery packet (SM > Get > > NodeInfo (attrID 0x11)). It's direct route initial path (an array of > > port numbers at the start of the next hop) is: > > Initial path = [1][81][1] which means that starting at the node > running > > OpenSM, port 1 then port 129 then port 1. Is there a large switch in > the > > middle ? Can you send the output of ibnetdiscover ? If that is > valid, > > which HCA (port) is not responding (what is the GUID) ? > [EZ] Normally all directed route dumps should start with: > Initial path = [0][.... > The first hop is reserved to 0 - so I wonde if the above text is a > direct quote from the osm.log ? > The fact you got there a [81] means that the packet should leave from > port 81 ??
81 being hex not decimal but it is still > 24. > I have never seen a switch with more then 24 ports... I thought that looked suspect. I didn't think there were any switch chassis that were hiding their multiple internal switch chips. > > Unfortunately on such an error osm does not appear to give up (it > > retries forever and is locked on such a node). This is obviously not > > good. > Also Troy if you are able to capture the entire log it might put some > light on the issue of "OpenSM never give up" on such cases - which we > want to resolve. OpenIB has retries built into the MAD layer as well as the OpenIB vendor layer doing some retries for a send which is supposed to be matched with a response and this times out. [There is a potential issue here relative to the VL15 counting on error which came up on the list a short while ago so I am looking at possibly a change to this area of the vendor layer but have not concluded my analysis of this yet.] -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
