Hi All, I have reached a deadlock caused by the fact that the function __free_mads takes h_al->mad_lock It then calls ib_put_mad which calls al_remove_mad that will try to take the same lock.
Please see the call stack. Child-SP RetAddr Call Site fffff880`021c9fd0 fffff800`01687a3f nt!KxWaitForSpinLockAndAcquire+0x20 fffff880`021ca000 fffff880`07cc775e nt!KeAcquireSpinLockAtDpcLevel+0x6f fffff880`021ca050 fffff880`07d0ef5c ibbus!cl_spinlock_acquire+0x5e [b:\users\tzachid\projinf5\trunk\inc\kernel\complib\cl_spinlock_osd.h @ 96] fffff880`021ca090 fffff880`07d30c5e ibbus!al_remove_mad+0x2c [b:\users\tzachid\projinf5\trunk\core\al\al.c @ 245] fffff880`021ca0d0 fffff880`07d0e5ed ibbus!ib_put_mad+0x23e [b:\users\tzachid\projinf5\trunk\core\al\kernel\al_mad_pool.c @ 923] fffff880`021ca110 fffff880`07d0e814 ibbus!__free_mads+0x8d [b:\users\tzachid\projinf5\trunk\core\al\al.c @ 143] fffff880`021ca160 fffff880`07d63131 ibbus!free_al+0x54 [b:\users\tzachid\projinf5\trunk\core\al\al.c @ 168] fffff880`021ca1a0 fffff880`07d61f67 ibbus!async_destroy_cb+0x7f1 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 842] fffff880`021ca210 fffff880`07d62820 ibbus!sync_destroy_obj+0x6a7 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 704] fffff880`021ca280 fffff880`07d61976 ibbus!destroy_obj+0x820 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 774] fffff880`021ca2f0 fffff880`07cdbf3e ibbus!sync_destroy_obj+0xb6 [b:\users\tzachid\projinf5\trunk\core\al\al_common.c @ 633] fffff880`021ca360 fffff880`07ca93ef ibbus!al_cleanup+0x3fe [b:\users\tzachid\projinf5\trunk\core\al\al_init.c @ 146] fffff880`021ca3c0 fffff880`07cce977 ibbus!fdo_release_resources+0x7af [b:\users\tzachid\projinf5\trunk\core\bus\kernel\bus_pnp.c @ 715] fffff880`021ca440 fffff880`07cce7ad ibbus!cl_do_remove+0x127 [b:\users\tzachid\projinf5\trunk\core\complib\kernel\cl_pnp_po.c @ 680] fffff880`021ca480 fffff880`07cc9e20 ibbus!__remove+0x15d [b:\users\tzachid\projinf5\trunk\core\complib\kernel\cl_pnp_po.c @ 648] fffff880`021ca4d0 fffff880`00c7093c ibbus!cl_pnp+0x1410 [b:\users\tzachid\projinf5\trunk\core\complib\kernel\cl_pnp_po.c @ 243] fffff880`021ca5c0 fffff880`00c692ce Wdf01000!FxPkgFdo::ProcessRemoveDeviceOverload+0x74 fffff880`021ca5f0 fffff880`00c67dd6 Wdf01000!FxPkgPnp::_PnpRemoveDevice+0x126 fffff880`021ca660 fffff880`00c37245 Wdf01000!FxPkgPnp::Dispatch+0x1b2 fffff880`021ca6d0 fffff880`00c3714b Wdf01000!FxDevice::Dispatch+0xa9 It seems to me, that the best way to solve this issue (without doing revolutions in the code) is to create a new version of ib_put_mad that will be called ib_put_mad_locked that will call al_remove_mad_locked (a new function as well) that will not take the lock again. Does anyone has objections or a better way to fix the issue? Thanks Tzachi
_______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
