> Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: Re: Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop > functioning > > Hi Yevgeny, > > On Wed, 2007-01-31 at 05:16, Yevgeny Kliteynik wrote: > > Hi Hal. > > > > I noticed the following bug in Bugzilla: > > > > Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning > > https://bugs.openfabrics.org/show_bug.cgi?id=329 > > > > When there is a HCA fatal event on the host that opensm is running on > > it, > > the opensm stop to function (After the event, the driver restart the > > device, > > and the port does not return to active state). > > > > If the opensm run in sweep mode , after the event you can see that the > > opensm > > stop sweeping. > > > > I remember that a couple of months ago I sent a patch that takes care of > > this problem: > > - in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit > > - in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep > > > > The problem with my patch was that it made osm to depend on uverbs module. > > To resolve this problem, support should be added in umad, and then osm could > > use this support. > > > > Do you know if some work in this area was done in umad? > > This has been on the list but unfortunately there has been no time yet > to work on the local events support in libibumad.
I do not think making libibmad depend on ib_uverbs module is a good idea either. More properly, the problem is in ib_umad which does not report hotplug events. If we just make ib_umad return an error code to user on hotplug, the problem will go away without userspace changes. -- MST _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
