Re: MessageListenerMultiConsumerTest deadlock when run in 0-10 code path using ant

Rafael Schloming Mon, 03 Mar 2008 04:34:13 -0800

Arnaud Simon wrote:

On Mon, 2008-03-03 at 10:22 +0000, Aidan Skinner wrote:

On Fri, Feb 29, 2008 at 8:23 PM, Rafael Schloming <[EMAIL PROTECTED]> wrote:

The deadlock is between the session's _messageDeliveryLock and the
 Dispatcher's _lock. The dispatcher thread's main loop attempts to
 acquire _lock first and then _messageDeliveryLock. The main thread
 acquires _messageDeliveryLock first thing on close(...) and then
 subsequently many levels down the stack attempts to acquire _lock in
 order to call Dispatcher.rejectPending(...).

We found (and fixed) several similar problems on M2, it's probably
worth having a look at that code to make sure there aren't any cases
being missed.

- Aidan


Rafael is absolutely right, Session.close is holding a lock on
_messageDeliveryLock only to prevent messages being delivered to
consumers when the session is closing (note that this lock should also
be held when consumer are closing, see Qpid-812). There is however no
reason for the dispatcher thread to keep a lock on _lock when
dispatching a message. _lock is used for stopping/starting the
dispatcher thread and _messageDeliveryLock for preventing message
delivery. So, we should change the run method of dispatcher for
releasing _lock and rollback and rejectPending for synchronizing on

_messageDeliveryLock.

This was my first thought too, however I think there actually is areason. Connection.stop() is supposed to block until all messagelisteners are finished with whatever they're currently work on. Holding_lock while executing the message listener does accomplish this. If wewere to release the lock earlier then stop() could return while themessage listener is still in progress.

Note that this deadlock situation would not happen with 0.8 as an
external thread is calling rejectPending but (if the dispatcher is
trying to dispatch a message) this thread will need to wait until close
finishes and will create a new dispatcher thread that will not be
closed.

I thought this too, however on closer examination the closing threaddoes a syncWrite, which means that thread will block until therejectPending is complete, so it should be the same as executing in thesame thread.

I suspect the potential for the deadlock to occur is actually stillthere on the 0-8 codepath, but it is not occuring for some reason. Idon't know if this is timing related or some other reason. The stacktrace suggests that the dispatcher thread is running as we attempt tocancel the consumer. Stopping the thread should prevent the deadlock, Idon't know if the 0-8 code does that or not.

The way the dispatcher thread is closed should also be changed. The
close operation currently interrupts the thread this means that the
dispatcher does not have a chance to acquire a lock on
_messageDeliveryLock for correctly releasing the message it may be

waiting to dispatch.

Agreed, the way it checks a private _closed variable rather thanchecking the session's _closed variable looks a bit dodgy also.


--Rafael

Re: MessageListenerMultiConsumerTest deadlock when run in 0-10 code path using ant

Reply via email to