Hi Hal.
I noticed the following bug in Bugzilla:
Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning
https://bugs.openfabrics.org/show_bug.cgi?id=329
When there is a HCA fatal event on the host that opensm is running on
it,
the opensm stop to function (After the event, the driver restart the
device,
and the port does not return to active state).
If the opensm run in sweep mode , after the event you can see that the
opensm
stop sweeping.
I remember that a couple of months ago I sent a patch that takes care of this
problem:
- in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit
- in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep
The problem with my patch was that it made osm to depend on uverbs module.
To resolve this problem, support should be added in umad, and then osm could
use this support.
Do you know if some work in this area was done in umad?
-- Yevgeny
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general