Taking a quick peek it as you said *seems* safe, since the 'closed'
call looks like the only place a non-app non-dispatcher thread could
get the MDL. I would probably need a long time to really convince
myself though.

Robbie

On 9 February 2015 at 11:47, Rob Godfrey <[email protected]> wrote:
> Hi Robbie,
>
> yes - my thinking is currently that the acquiring of the MDL in
> AMQSession.closed() is incorrect.  The MDL seems to be there to provide JMS
> semantics around the interaction between message delivery and other methods
> on the (JMS) session object.  As such it should only really be acquired
> from threads acting on behalf of the "application".  Application initiated
> closes already hold the MDL by the time they get into AMQSession.closed()
> so I think this can safely be removed.  In the other cases of the MDL being
> acquired after the failover mutex (such as BasicMessageConsumer.close()) I
> can see no reason why the lock acquisition order cannot be reversed (the
> locks are pretty much taken out one after another) and by reversing them we
> can prevent deadlock with the Connection close().
>
> Making the above change *seems* safe to me and (for what little it is
> worth) all the system tests continue to pass without issue on both the
> 0-9-1 and 0-10 paths.
>
> Can anyone see any issues with the above?
>
> -- Rob
>
> On 9 February 2015 at 12:15, Robbie Gemmell <[email protected]>
> wrote:
>
>> Hi Rob,
>>
>> Its a couple of years since I last did any real work on the 0-10
>> client, but around that time I did review some proposed changes
>> attempting removal of the MDL enough to know they likely introduced
>> new issues. The main thing I recall from back then is it being hard to
>> reason about the impact of such changes and the interaction of the
>> various locks due to the way the client threads can enter the
>> application space (e.g onMessage, exception listeners) and then
>> re-enter the client code while holding an unclear set of locks.
>>
>> What you wrote certainly makes logical sense in that if you always
>> acquire the MDL before the FM then even that behaviour shouldnt then
>> be an issue in terms of deadlocking between those two. The main
>> weirdness I can think comes of in terms of the stacktrace given in
>> that if the dispatcher is inside onMessage holding the MDL it could
>> then block the connection IOReceiver thread from completing whatever
>> it is attempting, leaving it unable to service new bytes arriving from
>> the broker (e.g should anything onMessage proceeds to do actually
>> generate new requests, or there being other sessions) as well as doing
>> anything else it would normally do, until that sessions onMessage
>> exists, which may itself cause different problems. Looking again at
>> the the given stacktrace, it does sound reasonable for the lock order
>> in connection close to be reversed, and I also wonder if the MDL is
>> truely needed when indicating the session closed due to an execution
>> exception. My hands are currently full on other things though so these
>> aren't areas I am going to be able to look at personally.
>>
>> Robbie
>>
>> On 7 February 2015 at 14:23, Rob Godfrey <[email protected]> wrote:
>> > Hi Rajith,
>> >
>> > so - I'm no expert on the locking model in the client... Logically I
>> would
>> > think that with the message delivery lock (MDL) being a session level
>> lock,
>> > and the failover mutex (FM) being connection wide, we should always
>> acquire
>> > the MDL before the FM.  Where we are currently acquiring the FM without
>> > acquiring the MDL I would (probably naively) think that we should first
>> be
>> > acquiring the MDL before the FM where the operation in question is
>> specific
>> > to a session (e.g. send a message, declare a queue, etc).  In the case
>> > where the operation is connection wide (which I think is pretty much
>> > limitted to connection closure) I would think that we would want to get
>> the
>> > MDLs for all of the sessions on the connection before getting the FM.
>> >
>> > Looking at the code I couldn't immediately see why such a change would
>> > cause an issue - if something is performing an operation against a
>> session
>> > in onMessage mode then it would currently hold the MDL... and if we are
>> > using receive() then holding the MDL is seemingly compatible with the
>> > definition of given by the javadoc.
>> >
>> > I expect I'm missing some subtlety here - can someone enlighten me?
>> >
>> > Thanks,
>> > Rob
>> >
>> > On 5 February 2015 at 15:16, Rajith Muditha Attapattu <
>> [email protected]>
>> > wrote:
>> >
>> >> The consensus is, that there is no easy fix, other than rewriting some
>> key
>> >> pieces of the client.
>> >> Given that a new JMS client is being actively developed, we have decided
>> >> not to devote any significant time/resources on the older client.
>> >>
>> >> The new JMS client is AMQP 1.0 and the source is here
>> >> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git
>> >> Robbie is the best person to give an accurate picture of the status for
>> >> this project.
>> >>
>> >> Rajith
>> >>
>> >> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <[email protected]>
>> >> wrote:
>> >>
>> >> > To answer the easy question first, 0.32 clients should be totally
>> >> > compatible with an 0.16 broker.
>> >> >
>> >> > In terms of the deadlock issue, I know there are a number of open
>> >> deadlock
>> >> > JIRAs that Robbie and Rajith have looked at in the past... Hopefully
>> one
>> >> of
>> >> > them will be able to chip in here and discuss the feasibility of
>> fixing
>> >> > your particular issue for 0.32.
>> >> >
>> >> > -- Rob
>> >> >
>> >> > On 4 February 2015 at 23:36, Helen Kwong <[email protected]>
>> wrote:
>> >> >
>> >> > > Hi Qpid gurus,
>> >> > >
>> >> > > We are using 0.16 Java broker and client on 0-10, and we are running
>> >> into
>> >> > > deadlock issues on the client that involve AMQSession's
>> >> > > _messageDeliveryLock and AMQConnection's _failoverMutex, where
>> >> different
>> >> > > threads acquire them in different orders. This is leading to major
>> >> > > production headaches for us and has us very worried.
>> >> > >
>> >> > > (I've been looking at 0.16 code mostly but also skimmed the relevant
>> >> > parts
>> >> > > in 0.32, which seem largely the same in those places.)
>> >> > >
>> >> > > Deadlock Variety 1
>> >> > >
>> >> > > This is an example of a deadlock we see, where the IOReceiver thread
>> >> > > deadlocks with Session dispatcher thread (we have listeners that
>> call
>> >> > > session rollback or commit in onMessage()):
>> >> > >
>> >> > > IOReceiver
>> >> > > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818)
>> >> > ---------->
>> >> > > waiting for session's messageDeliveryLock
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282)
>> >> > >  ----------> acquires connection's failoverMutex
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55)
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
>> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159)
>> >> > > org.apache.qpid.transport.Session.received(Session.java:585)
>> >> > > org.apache.qpid.transport.Connection.dispatch(Connection.java:412)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
>> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159)
>> >> > > org.apache.qpid.transport.Connection.received(Connection.java:367)
>> >> > > org.apache.qpid.transport.Connection.received(Connection.java:65)
>> >> > > org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97)
>> >> > >
>> >> org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198)
>> >> > >
>> org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131)
>> >> > > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
>> >> > >
>> >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:102)
>> >> > >
>> org.apache.qpid.transport.network.Assembler.received(Assembler.java:44)
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44)
>> >> > >
>> >> org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
>> >> > > java.lang.Thread.run(Thread.java:745)
>> >> > >
>> >> > > Session dispatcher thread
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255)
>> >> > >  ---------> waiting for connection's failoverMutex
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
>> >> > >
>> org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812)
>> >> > >
>> org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075)
>> >> > > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837)
>> >> > > common.messaging.QpidSession.rollback(QpidSession.java:211)
>> >> > >
>> >> > >
>> >> >
>> >>
>> common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284)
>> >> > >
>> >> >
>> >>
>> common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393)
>> >> > >   -----------> acquires session's messageDeliverylock
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180)
>> >> > > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173)
>> >> > >
>> >> > >
>> >> >
>> >>
>> org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
>> >> > >
>> org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316)
>> >> > > java.lang.Thread.run(Thread.java:745)
>> >> > >
>> >> > > The problem is that the IOReceiver thread acquires failoverMutex
>> before
>> >> > > messageDeliveryLock (for each session), whereas the dispatcher
>> thread
>> >> > > acquires it in the other order. We also see potential problems where
>> >> > other
>> >> > > threads (instead of IOReceiver) can deadlock with the dispatcher
>> >> thread,
>> >> > as
>> >> > > long as it acquires failoverMutex before messageDeliveryLock.
>> Examples
>> >> we
>> >> > > can think of:
>> >> > >
>> >> > > A) Another thread calling AMQSession.close()
>> >> > > B) Another thread calling BasicMessageConsumer.close()
>> >> > > C) Same connection, different session's dispatcher thread, calling
>> >> > > rollback() or commit() -> sync() -> setCurrentException() ->
>> >> > > AMQConnection.exceptionReceived() ->
>> AMQConnection.closeAllSessions(),
>> >> > > which can try to acquire the messageDeliveryLock of another session
>> and
>> >> > > deadlock with the other session's dispatcher thread
>> >> > >
>> >> > >
>> >> > > Deadlock Variety 2:
>> >> > > From code inspection, it also appears that AMQConnection.close() can
>> >> > > deadlock with either AMQSession.close() or
>> BasicMessageConsumer.close()
>> >> > > (where the session / consumer is on the same connection). This is
>> >> because
>> >> > > AMQConnection.close() first acquires the messageDeliveryLock of all
>> its
>> >> > > sessions in the recursive doClose(), before trying to acquire the
>> >> > > connection's failoverMutex. But the Session / consumer's close()
>> >> acquires
>> >> > > the failoverMutex before messageDeliveryLock. We haven't seen this
>> >> happen
>> >> > > but would like to know if this is possible.
>> >> > >
>> >> > >
>> >> > > We'd really appreciate your help on this. Assuming these can be
>> fixed
>> >> in
>> >> > > 0.32, we are also wondering if clients are backward compatible --
>> i.e.,
>> >> > can
>> >> > > we upgrade only our client to 0.32 while continuing to use the 0.16
>> >> > broker?
>> >> > >
>> >> > > Thanks,
>> >> > > Helen
>> >> > >
>> >> >
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to