Title: RE: [openib-general] SM Bad Port Handling

>
> When the SM sends a direct route MAD it saves the port guid (and port
> num) in the madw context, so that when there is a reply or timeout you
> can easily find the port. That means you dont have to walk the entire DR
> path to find the unhealthy port. That means that the peer port (from
> which we arrived to the bad port) is unhealthy. Does this address your
> concern ?
>
[EZ] Not at all. Although the target port is known. The flaky link that fails the mad might be anywhere along the path to the port. So, if you mark the target port as bad you might be marking the wrong port!

 [EZ] Let me clarify with an example:
SM=HCA1/P1 -> SW1/P1....SW1/P2->SW2/P1..SW2/P2->SW3/P1....SW3/P3->HCA2/P1
                                            \..SW4/P4->SW3/P4..SW3/P5->SW3/P2../
                          
If the flaky link is between SW2/P2 and SW3/P1 then the packet sent to HCA2 using DR : [0][1][1][2][3] might fail . If you mark HCA2/P1 as bad then you actually will loose that HCA for no good reason since another path from SM to HCA2 exists.

EZ

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to