On Tuesday 03 October 2006 12:46, Hal Rosenstock wrote: > On Tue, 2006-10-03 at 03:46, Jack Morgenstein wrote: > > On Sunday 01 October 2006 13:14, Michael S. Tsirkin wrote: > > > Quoting r. Jack Morgenstein <[EMAIL PROTECTED]>: > > > > Subject: Kernel Oops in user-mad, mad > > > > > > > > We received the following kernel Oops while running regression > > > > (see console picture attached). > > > > > > > > This looks like a possible race condition between handling umad send > > > > completions > > > > and ib_unregister_mad_agent. > > > > > > > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > > > > Note that ib_unregister_mad_agent invokes > > > > unregister_mad_agent->cancel_mads -> agent send handler. > > > > > > > > Is there a possibility that there is a double deletion from a list > > > > somewhere? > > > > > > > > Jack > > > > > > > > > > > > > > > > > > Was this during module unload? > > No. > > What caused the ib_unregister_mad_agent routine to be invoked ? Was > OpenSM shutting down when this occurred ? Can you provide any more > details on the scenario which caused this ? > > -- Hal
This was during the testing of MPI. Opensm is invoked once (also shut down) before running an MPI test; Evidently, this occurred between MPI tests. We don't have any info beyond this. - Jack _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
