As Itamar observed, the first message is printed out by _raw_spin_lock() when the kernel is compiled for UP and spin lock debugging.

dapl_ep_disconnect() is trying to obtain a lock that is already locked. The message indicates that the lock was taken in dapl_evd_connection_callback(). There is no control flow path from dapl_evd_connection_callback() that reaches dapl_ep_disconnect().

I'm also unsure of how execution could have reached dapl_ep_disconnect() with the spin lock locked. We are using spin_lock_irqsave(). My understanding is that interrupts will be masked until spin_unlock_irqrestore() is called. That would imply that it is not possible for the control flow to change to another context that calls dapl_ep_disconnect().

The second message is a by-product of the first problem. dapl_ep_disconnect() unlocks the spin lock, so when control returns to dapl_evd_connection_callback(), the lock is already unlocked.

So we just need to fix the first problem. Are we using spin_lock_irqsave() incorrectly?

james

On Thu, 16 Jun 2005, Itamar Rabenstein wrote:

Hi Hal,
I am trying to understand what is going here and i still dont see how this
happan .

This prints are only set in UP mode .(is this your system UP?)
the code is (function: dapl_evd_connection_callback):
spin_lock_irqsave(&ep->common.lock, ep->common.flags);
case on the event type
disconnect:  dapl_ib_disconnect_clean(ep, TRUE);
spin_unlock_irqrestore(&ep->common.lock, ep->common.flags);

from some reason in the middle between the lock and the unlock there is a
call to consumer
function (dat_ep_disconnetc) that try to disconnect the same ep and the lock
fail.

the evd_cb function is either an interupt from the CM so i dont see how the
consumer can call
dat_ib_disconnect in the middle
or the user called twice to dat_ib_disconnect on the same ep and youe kernel
give preemption

i dont understand both (;-)

can you try to run it with some debug?
at least ot know who called to dapl_evd_connection_callback ?

Itamar


-----Original Message-----
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 14, 2005 8:37 PM
To: James Lentini
Cc: [email protected]
Subject: [openib-general] kdapl locking problem


Hi,

When running in loopback mode (client and server on same
machine (x86)):
kdapltest -T T -s <IP addr> -D mthca0a -d -t 2 -w 8 -i 20
client SR server SR
I see the following locking problem:

Jun 14 13:30:08 localhost kernel:
drivers/infiniband/ulp/dat-provider/dapl_ep.c:1111:
spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b1c
18) already locked by
drivers/infiniband/ulp/dat-provider/dapl_evd.c/756
Jun 14 13:30:08 localhost kernel:
drivers/infiniband/ulp/dat-provider/dapl_evd.c:797:
spin_unlock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b
1c18) not locked

-- Hal

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to