Arnaud Simon wrote:
On Mon, 2008-03-03 at 10:22 +0000, Aidan Skinner wrote:
On Fri, Feb 29, 2008 at 8:23 PM, Rafael Schloming <[EMAIL PROTECTED]> wrote:

The deadlock is between the session's _messageDeliveryLock and the
 Dispatcher's _lock. The dispatcher thread's main loop attempts to
 acquire _lock first and then _messageDeliveryLock. The main thread
 acquires _messageDeliveryLock first thing on close(...) and then
 subsequently many levels down the stack attempts to acquire _lock in
 order to call Dispatcher.rejectPending(...).
We found (and fixed) several similar problems on M2, it's probably
worth having a look at that code to make sure there aren't any cases
being missed.

- Aidan


Rafael is absolutely right, Session.close is holding a lock on
_messageDeliveryLock only to prevent messages being delivered to
consumers when the session is closing (note that this lock should also
be held when consumer are closing, see Qpid-812). There is however no
reason for the dispatcher thread to keep a lock on _lock when
dispatching a message. _lock is used for stopping/starting the
dispatcher thread and _messageDeliveryLock for preventing message
delivery. So, we should change the run method of dispatcher for
releasing _lock and rollback and rejectPending for synchronizing on
_messageDeliveryLock.

This was my first thought too, however I think there actually is a reason. Connection.stop() is supposed to block until all message listeners are finished with whatever they're currently work on. Holding _lock while executing the message listener does accomplish this. If we were to release the lock earlier then stop() could return while the message listener is still in progress.

Note that this deadlock situation would not happen with 0.8 as an
external thread is calling rejectPending but (if the dispatcher is
trying to dispatch a message) this thread will need to wait until close
finishes and will create a new dispatcher thread that will not be
closed.

I thought this too, however on closer examination the closing thread does a syncWrite, which means that thread will block until the rejectPending is complete, so it should be the same as executing in the same thread.

I suspect the potential for the deadlock to occur is actually still there on the 0-8 codepath, but it is not occuring for some reason. I don't know if this is timing related or some other reason. The stack trace suggests that the dispatcher thread is running as we attempt to cancel the consumer. Stopping the thread should prevent the deadlock, I don't know if the 0-8 code does that or not.

The way the dispatcher thread is closed should also be changed. The
close operation currently interrupts the thread this means that the
dispatcher does not have a chance to acquire a lock on
_messageDeliveryLock for correctly releasing the message it may be
waiting to dispatch.

Agreed, the way it checks a private _closed variable rather than checking the session's _closed variable looks a bit dodgy also.

--Rafael

Reply via email to