Hello Sean,
We have the following code commented out at umad_receiver_stop:
        /* XXX hangs current thread - suspect umad_recv() ignoring wakeup.
        cl_thread_destroy(&p_ur->tid);
        */
How can one ensure that umad_receiver thread will not run after 
osm_vendor_delete was called ?
Please, the e-mails below for more details

XaleX

From: Hal Rosenstock
Sent: Wednesday, April 27, 2011 4:10 PM
To: Alex Naslednikov; OpenSM; Gilad Margalit; Uri Habusha
Subject: RE: Opensm bug

PSB [HNR]

From: Alex Naslednikov
Sent: Wednesday, April 27, 2011 9:08 AM
To: Hal Rosenstock; OpenSM; Gilad Margalit; Uri Habusha
Subject: RE: Opensm bug

Hal,
Thank you for the fast response
I see that the code of umad_receiver_stop is the same for 2.3.0 (trunk), so 
cl_thread_destroy will be called.
Can you please explain while umad_receiver_thread can continue running in this 
case ?
[HNR] Huh ? Isn't that code commented out ?

From: Hal Rosenstock
Sent: Wednesday, April 27, 2011 3:56 PM
To: Alex Naslednikov; OpenSM; Gilad Margalit; Uri Habusha
Subject: RE: Opensm bug

PSB [HNR]

From: Alex Naslednikov
Sent: Wednesday, April 27, 2011 8:10 AM
To: OpenSM; Gilad Margalit; Uri Habusha
Subject: Opensm bug

Hi all,
Recently we got the following assert:

ASSERT happened:  &p_log->lock is not initialized
complibd!cl_spinlock_acquire+0x39 
[s:\builds\7789\trunk\inc\user\complib\cl_spinlock_osd.h @ 107]
opensm!osm_log+0x1b8 [s:\builds\7789\trunk\ulp\opensm\user\opensm\osm_log.c @ 
171]
opensm!osm_vendor_get+0x174 
[s:\builds\7789\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 1007]
opensm!osm_mad_pool_get+0xbb 
[s:\builds\7789\trunk\ulp\opensm\user\opensm\osm_mad_pool.c @ 95]
opensm!umad_receiver+0x3b4 
[s:\builds\7789\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 314]
complibd!cl_thread_callback+0x1a 
[s:\builds\7789\trunk\core\complib\user\cl_thread.c @ 49]

Did somebody see this problem before ?
[HNR] FWIW I haven't.

Can it happen that umad_receiver tries to access opensm when destroy process 
was already started ?
[HNR] osm_vendor_delete is called prior to destroying the log (in 
osm_opensm_destory) so I wouldn't think that should be the case. However, 
looking at some perhaps older version of osm_vendor_ibumad.c for Windows (MLNX 
OFED 2.1.3), I see:

static void umad_receiver_stop(umad_receiver_t * p_ur)
{
#ifdef HAVE_LIBPTHREADS
        pthread_cancel(p_ur->tid);
        pthread_join(p_ur->tid, NULL);
        p_ur->tid = 0;
#else
        /* XXX hangs current thread - suspect umad_recv() ignoring wakeup.
        cl_thread_destroy(&p_ur->tid);
        */
#endif

I don't know if that's still the case but that looks to me like it could result 
in umad_receiver thread still running after the log is destroyed :(

There are other unrelated problems in the Windows implementation of that file 
too (e.g. osm_vendor_set_sm is unimplemented which is problematic to multi SM 
operation).


n  Hal

Alexander (XaleX) Naslednikov
SW Networking Team
Mellanox Technologies

_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to