Uri, I was unable to reproduce the DEBUG ASSERT() firing with ConnectX hardware [enable/disable, shutdown cycles] or Mthca hardware. The offending ASSERT() was removed to ensure forward progress for all; the PNP port_remove() race condition is still not fully understood.
Stan. Revision: 3267 Author: stansmith Date: 3:00:15 PM, Friday, August 19, 2011 Message: [BUS] removed DBG ASSERT on Uri's request as it fires during DBG version WHQL testing; I was unable to get the ASSERT to fire on ConnectX disable or shutdown. There is a port destruction race condition here. Call to port_remove() skipped if !p_ctx as port_remove itself checks for null p_ctx. ---- Modified : /gen1/trunk/core/bus/kernel/bus_port_mgr.c From: Uri Habusha [mailto:[email protected]] Sent: Thursday, July 21, 2011 4:49 AM To: Smith, Stan Cc: [email protected]; Leonid Keller; Tzachi Dar; Gilad Margalit; Benyahu Mizrahi Subject: issue with checkin# 3122 Hi Stan, I adopted your checkin# 3122 - IOC poll on demand. When disabling the drive an ASSERT is popup. The ASSERT is in following code in port_mgr_pnp_cb function CL_ASSERT( p_ctx ); <== The problematic assert if (p_ctx) { p_bfi = p_ctx->p_bus_filter; CL_ASSERT( p_bfi ); if (p_bfi->p_port_mgr->active_ports > 0) cl_atomic_dec( &p_bfi->p_port_mgr->active_ports ); } port_mgr_port_remove( (ib_pnp_port_rec_t*)p_pnp_rec ); break; I noticed that in the port_mgr_port_remove there is a check if the ctx is valid or not. So I guess it's a known issue that can happen. For now I removed the assert in our code. Please take a look in the code and see if it's valid fix (if so please change ofw code accordingly) or debug the issue. It happens when disable \enable the low level driver. Uri Uri Habusha Windows SW Development Lead Mellanox Technologies P.OBox 586, Yokneam 20692 Israel
_______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
