Uri,
  I was unable to reproduce the DEBUG ASSERT() firing with ConnectX hardware 
[enable/disable, shutdown cycles] or Mthca hardware.
The offending ASSERT() was removed to ensure forward progress for all; the PNP 
port_remove() race condition is still not fully understood.

Stan.

Revision: 3267
Author: stansmith
Date: 3:00:15 PM, Friday, August 19, 2011
Message:
[BUS] removed DBG ASSERT on Uri's request as it fires during DBG version WHQL 
testing; I was unable to get the ASSERT to fire on ConnectX disable or 
shutdown. There is a port destruction race condition here.
Call to port_remove() skipped if !p_ctx as port_remove itself checks for null 
p_ctx.
----
Modified : /gen1/trunk/core/bus/kernel/bus_port_mgr.c


From: Uri Habusha [mailto:[email protected]]
Sent: Thursday, July 21, 2011 4:49 AM
To: Smith, Stan
Cc: [email protected]; Leonid Keller; Tzachi Dar; Gilad Margalit; 
Benyahu Mizrahi
Subject: issue with checkin# 3122

Hi Stan,

I adopted your checkin# 3122 - IOC poll on demand.

When disabling the drive an ASSERT is popup. The ASSERT is in  following code 
in port_mgr_pnp_cb function

                                CL_ASSERT( p_ctx );    <== The problematic 
assert
                                if (p_ctx)
                                {
                                                p_bfi = p_ctx->p_bus_filter;
                                                CL_ASSERT( p_bfi );
                                                if 
(p_bfi->p_port_mgr->active_ports > 0)
                                                                cl_atomic_dec( 
&p_bfi->p_port_mgr->active_ports );
                                }
                                port_mgr_port_remove( 
(ib_pnp_port_rec_t*)p_pnp_rec );
                                break;

I noticed that in the port_mgr_port_remove there is a check if the ctx is valid 
or not. So I guess it's a known issue that can happen. For now I removed the 
assert in our code.

Please take a look in the code and see if it's valid fix (if so please change 
ofw code accordingly) or debug the issue. It happens when disable \enable the 
low level driver.

Uri


Uri Habusha
Windows SW Development Lead

Mellanox Technologies
P.OBox 586, Yokneam 20692
Israel



_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to