Title: RE: [openib-general] SM Bad Port Handling

> -- Hal
> In looking at the unhealthy code, it appears to me that the unhealthy
> bit is only set if the SM receives traps 129-131 and not if the SMA does
> not respond to SM MADs so these ports will not be detected and hence not
> bypassed.
>
[EZ] This is true. Currently there is only one cause for the un-healthy bits to be set - which are exactly as you point - these traps. The point I was trying to make was that this bit is the mechanism for flagging a port status is bad.

What I did recommend was to write a "statistical" analysis of Directed Route packet drop - such that we can find the ports with a high drop rate and mark them as un-healthy. If you mark every port that does not respond to a MAD as un-healthy you can suffer from flaky links somewhere on the route to that port. Only analysis of the number of good packets vs. dropped packets can lead you to the right bad port.

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to