We added it temporarily and removed it due to these problems. Sorry for the misleading information regarding the close_port.
Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > Sent: Monday, November 07, 2005 4:55 PM > To: Eitan Zahavi > Cc: Yael Kalka; [email protected] > Subject: RE: [PATCH] Opensm - exiting issues > > On Mon, 2005-11-07 at 09:42, Eitan Zahavi wrote: > > Hi Hal, > > > > I will answer for Yael as she already left the office. > > > > The way to reproduce the "stuck" case is to run in bash: > > % while test $? = 0; do opensm -V -o; done > > > > The symptom we see is that OpenSM sort of exists but the process stay > > active (not even defunct). No way to kill it. It seems like one of the > > threads gets caught in the middle of ioctl or something. To be able to > > run OpenSM after this we need to reboot the machine. > > > > We avoid it by not issuing umad_unregister and umad_close_port > > I saw the change to not call umad_unregister in the patch. Where is the > change for umad_close_port ? > > -- Hal > > > Eitan Zahavi > > Design Technology Director > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > > > Sent: Monday, November 07, 2005 4:21 PM > > > To: [EMAIL PROTECTED] > > > Cc: [email protected]; [EMAIL PROTECTED] > > > Subject: Re: [PATCH] Opensm - exiting issues > > > > > > Hi Yael, > > > > > > On Mon, 2005-11-07 at 08:25, Yael Kalka wrote: > > > > Hi Hal, > > > > > > > > There was a problem when running opensm with -o option, that caused > > > > the opensm to always exit with segfault, due to object destruction > > > > ordering. Also - there is the known issue of exiting opensm. We've > > > > done some clearing to the exiting code. The following patch fixes > > most > > > > of it. > > > > > > I applied this part of the patch with some cosmetic changes in > > > osm_vendor_ibumad.c. > > > > > > > In the current code we saw that sometimes opensm gets "stuck" on > > exit, > > > > and causes the machine to get stuck too - resulting in need for > > > > rebooting. In the following patch fixes most of it. > > > > We did run (in the patch) into rare cases where opensm exits with an > > > > error, but at least it exits without stucking the machine... > > > > > > Is there a reliable way to recreate machine "stuck" ? What exactly do > > > you mean by this ? > > > > > > All umad_unregister does is some validation, a table lookup, and issue > > > the ioctl to unregister the MAD agent. Not explictly unregistering the > > > agent(s) does not cause any harm as when the fd is closed, this will > > > occur as part of the cleanup. > > > > > > -- Hal > > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
