Hi Robbie,

yes - my thinking is currently that the acquiring of the MDL in
AMQSession.closed() is incorrect.  The MDL seems to be there to provide JMS
semantics around the interaction between message delivery and other methods
on the (JMS) session object.  As such it should only really be acquired
from threads acting on behalf of the "application".  Application initiated
closes already hold the MDL by the time they get into AMQSession.closed()
so I think this can safely be removed.  In the other cases of the MDL being
acquired after the failover mutex (such as BasicMessageConsumer.close()) I
can see no reason why the lock acquisition order cannot be reversed (the
locks are pretty much taken out one after another) and by reversing them we
can prevent deadlock with the Connection close().

Making the above change *seems* safe to me and (for what little it is
worth) all the system tests continue to pass without issue on both the
0-9-1 and 0-10 paths.

Can anyone see any issues with the above?

-- Rob

On 9 February 2015 at 12:15, Robbie Gemmell <[email protected]>
wrote:

> Hi Rob,
>
> Its a couple of years since I last did any real work on the 0-10
> client, but around that time I did review some proposed changes
> attempting removal of the MDL enough to know they likely introduced
> new issues. The main thing I recall from back then is it being hard to
> reason about the impact of such changes and the interaction of the
> various locks due to the way the client threads can enter the
> application space (e.g onMessage, exception listeners) and then
> re-enter the client code while holding an unclear set of locks.
>
> What you wrote certainly makes logical sense in that if you always
> acquire the MDL before the FM then even that behaviour shouldnt then
> be an issue in terms of deadlocking between those two. The main
> weirdness I can think comes of in terms of the stacktrace given in
> that if the dispatcher is inside onMessage holding the MDL it could
> then block the connection IOReceiver thread from completing whatever
> it is attempting, leaving it unable to service new bytes arriving from
> the broker (e.g should anything onMessage proceeds to do actually
> generate new requests, or there being other sessions) as well as doing
> anything else it would normally do, until that sessions onMessage
> exists, which may itself cause different problems. Looking again at
> the the given stacktrace, it does sound reasonable for the lock order
> in connection close to be reversed, and I also wonder if the MDL is
> truely needed when indicating the session closed due to an execution
> exception. My hands are currently full on other things though so these
> aren't areas I am going to be able to look at personally.
>
> Robbie
>
> On 7 February 2015 at 14:23, Rob Godfrey <[email protected]> wrote:
> > Hi Rajith,
> >
> > so - I'm no expert on the locking model in the client... Logically I
> would
> > think that with the message delivery lock (MDL) being a session level
> lock,
> > and the failover mutex (FM) being connection wide, we should always
> acquire
> > the MDL before the FM.  Where we are currently acquiring the FM without
> > acquiring the MDL I would (probably naively) think that we should first
> be
> > acquiring the MDL before the FM where the operation in question is
> specific
> > to a session (e.g. send a message, declare a queue, etc).  In the case
> > where the operation is connection wide (which I think is pretty much
> > limitted to connection closure) I would think that we would want to get
> the
> > MDLs for all of the sessions on the connection before getting the FM.
> >
> > Looking at the code I couldn't immediately see why such a change would
> > cause an issue - if something is performing an operation against a
> session
> > in onMessage mode then it would currently hold the MDL... and if we are
> > using receive() then holding the MDL is seemingly compatible with the
> > definition of given by the javadoc.
> >
> > I expect I'm missing some subtlety here - can someone enlighten me?
> >
> > Thanks,
> > Rob
> >
> > On 5 February 2015 at 15:16, Rajith Muditha Attapattu <
> [email protected]>
> > wrote:
> >
> >> The consensus is, that there is no easy fix, other than rewriting some
> key
> >> pieces of the client.
> >> Given that a new JMS client is being actively developed, we have decided
> >> not to devote any significant time/resources on the older client.
> >>
> >> The new JMS client is AMQP 1.0 and the source is here
> >> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git
> >> Robbie is the best person to give an accurate picture of the status for
> >> this project.
> >>
> >> Rajith
> >>
> >> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <[email protected]>
> >> wrote:
> >>
> >> > To answer the easy question first, 0.32 clients should be totally
> >> > compatible with an 0.16 broker.
> >> >
> >> > In terms of the deadlock issue, I know there are a number of open
> >> deadlock
> >> > JIRAs that Robbie and Rajith have looked at in the past... Hopefully
> one
> >> of
> >> > them will be able to chip in here and discuss the feasibility of
> fixing
> >> > your particular issue for 0.32.
> >> >
> >> > -- Rob
> >> >
> >> > On 4 February 2015 at 23:36, Helen Kwong <[email protected]>
> wrote:
> >> >
> >> > > Hi Qpid gurus,
> >> > >
> >> > > We are using 0.16 Java broker and client on 0-10, and we are running
> >> into
> >> > > deadlock issues on the client that involve AMQSession's
> >> > > _messageDeliveryLock and AMQConnection's _failoverMutex, where
> >> different
> >> > > threads acquire them in different orders. This is leading to major
> >> > > production headaches for us and has us very worried.
> >> > >
> >> > > (I've been looking at 0.16 code mostly but also skimmed the relevant
> >> > parts
> >> > > in 0.32, which seem largely the same in those places.)
> >> > >
> >> > > Deadlock Variety 1
> >> > >
> >> > > This is an example of a deadlock we see, where the IOReceiver thread
> >> > > deadlocks with Session dispatcher thread (we have listeners that
> call
> >> > > session rollback or commit in onMessage()):
> >> > >
> >> > > IOReceiver
> >> > > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818)
> >> > ---------->
> >> > > waiting for session's messageDeliveryLock
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282)
> >> > >  ----------> acquires connection's failoverMutex
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
> >> > >
> >> >
> >>
> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55)
> >> > >
> >> >
> >>
> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
> >> > >
> >> >
> >>
> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159)
> >> > > org.apache.qpid.transport.Session.received(Session.java:585)
> >> > > org.apache.qpid.transport.Connection.dispatch(Connection.java:412)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159)
> >> > > org.apache.qpid.transport.Connection.received(Connection.java:367)
> >> > > org.apache.qpid.transport.Connection.received(Connection.java:65)
> >> > > org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97)
> >> > >
> >> org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198)
> >> > >
> org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131)
> >> > > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
> >> > >
> >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:102)
> >> > >
> org.apache.qpid.transport.network.Assembler.received(Assembler.java:44)
> >> > >
> >> >
> >>
> org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44)
> >> > >
> >> org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
> >> > > java.lang.Thread.run(Thread.java:745)
> >> > >
> >> > > Session dispatcher thread
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255)
> >> > >  ---------> waiting for connection's failoverMutex
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
> >> > >
> org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812)
> >> > >
> org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075)
> >> > > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837)
> >> > > common.messaging.QpidSession.rollback(QpidSession.java:211)
> >> > >
> >> > >
> >> >
> >>
> common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284)
> >> > >
> >> >
> >>
> common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393)
> >> > >   -----------> acquires session's messageDeliverylock
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180)
> >> > > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173)
> >> > >
> >> > >
> >> >
> >>
> org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
> >> > >
> org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316)
> >> > > java.lang.Thread.run(Thread.java:745)
> >> > >
> >> > > The problem is that the IOReceiver thread acquires failoverMutex
> before
> >> > > messageDeliveryLock (for each session), whereas the dispatcher
> thread
> >> > > acquires it in the other order. We also see potential problems where
> >> > other
> >> > > threads (instead of IOReceiver) can deadlock with the dispatcher
> >> thread,
> >> > as
> >> > > long as it acquires failoverMutex before messageDeliveryLock.
> Examples
> >> we
> >> > > can think of:
> >> > >
> >> > > A) Another thread calling AMQSession.close()
> >> > > B) Another thread calling BasicMessageConsumer.close()
> >> > > C) Same connection, different session's dispatcher thread, calling
> >> > > rollback() or commit() -> sync() -> setCurrentException() ->
> >> > > AMQConnection.exceptionReceived() ->
> AMQConnection.closeAllSessions(),
> >> > > which can try to acquire the messageDeliveryLock of another session
> and
> >> > > deadlock with the other session's dispatcher thread
> >> > >
> >> > >
> >> > > Deadlock Variety 2:
> >> > > From code inspection, it also appears that AMQConnection.close() can
> >> > > deadlock with either AMQSession.close() or
> BasicMessageConsumer.close()
> >> > > (where the session / consumer is on the same connection). This is
> >> because
> >> > > AMQConnection.close() first acquires the messageDeliveryLock of all
> its
> >> > > sessions in the recursive doClose(), before trying to acquire the
> >> > > connection's failoverMutex. But the Session / consumer's close()
> >> acquires
> >> > > the failoverMutex before messageDeliveryLock. We haven't seen this
> >> happen
> >> > > but would like to know if this is possible.
> >> > >
> >> > >
> >> > > We'd really appreciate your help on this. Assuming these can be
> fixed
> >> in
> >> > > 0.32, we are also wondering if clients are backward compatible --
> i.e.,
> >> > can
> >> > > we upgrade only our client to 0.32 while continuing to use the 0.16
> >> > broker?
> >> > >
> >> > > Thanks,
> >> > > Helen
> >> > >
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to