As Itamar observed, the first message is printed out by
_raw_spin_lock() when the kernel is compiled for UP and spin lock
debugging.
dapl_ep_disconnect() is trying to obtain a lock that is already
locked. The message indicates that the lock was taken in
dapl_evd_connection_callback(). There is no control flow path from
dapl_evd_connection_callback() that reaches dapl_ep_disconnect().
I'm also unsure of how execution could have reached
dapl_ep_disconnect() with the spin lock locked. We are using
spin_lock_irqsave(). My understanding is that interrupts will be
masked until spin_unlock_irqrestore() is called. That would imply that
it is not possible for the control flow to change to another context
that calls dapl_ep_disconnect().
The second message is a by-product of the first problem.
dapl_ep_disconnect() unlocks the spin lock, so when control returns to
dapl_evd_connection_callback(), the lock is already unlocked.
So we just need to fix the first problem. Are we using
spin_lock_irqsave() incorrectly?
james
On Thu, 16 Jun 2005, Itamar Rabenstein wrote:
Hi Hal,
I am trying to understand what is going here and i still dont see how this
happan .
This prints are only set in UP mode .(is this your system UP?)
the code is (function: dapl_evd_connection_callback):
spin_lock_irqsave(&ep->common.lock, ep->common.flags);
case on the event type
disconnect: dapl_ib_disconnect_clean(ep, TRUE);
spin_unlock_irqrestore(&ep->common.lock, ep->common.flags);
from some reason in the middle between the lock and the unlock there is a
call to consumer
function (dat_ep_disconnetc) that try to disconnect the same ep and the lock
fail.
the evd_cb function is either an interupt from the CM so i dont see how the
consumer can call
dat_ib_disconnect in the middle
or the user called twice to dat_ib_disconnect on the same ep and youe kernel
give preemption
i dont understand both (;-)
can you try to run it with some debug?
at least ot know who called to dapl_evd_connection_callback ?
Itamar
-----Original Message-----
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 14, 2005 8:37 PM
To: James Lentini
Cc: [email protected]
Subject: [openib-general] kdapl locking problem
Hi,
When running in loopback mode (client and server on same
machine (x86)):
kdapltest -T T -s <IP addr> -D mthca0a -d -t 2 -w 8 -i 20
client SR server SR
I see the following locking problem:
Jun 14 13:30:08 localhost kernel:
drivers/infiniband/ulp/dat-provider/dapl_ep.c:1111:
spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b1c
18) already locked by
drivers/infiniband/ulp/dat-provider/dapl_evd.c/756
Jun 14 13:30:08 localhost kernel:
drivers/infiniband/ulp/dat-provider/dapl_evd.c:797:
spin_unlock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b
1c18) not locked
-- Hal
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general