Sounds reasonable.

Robbie

On 9 February 2015 at 13:40, Rob Godfrey <[email protected]> wrote:
> So looking back to when the MDL acquisition in closed() was first
> introduced... it seems to have been for
> https://issues.apache.org/jira/browse/QPID-547 which I think really should
> only have covered the close() case (i.e. application initiated closure)
> rather than the server initiated closure.
>
> As such I'm happy removing the MDL acquisition there.
>
> -- Rob
>
> On 9 February 2015 at 13:26, Robbie Gemmell <[email protected]>
> wrote:
>
>> Taking a quick peek it as you said *seems* safe, since the 'closed'
>> call looks like the only place a non-app non-dispatcher thread could
>> get the MDL. I would probably need a long time to really convince
>> myself though.
>>
>> Robbie
>>
>> On 9 February 2015 at 11:47, Rob Godfrey <[email protected]> wrote:
>> > Hi Robbie,
>> >
>> > yes - my thinking is currently that the acquiring of the MDL in
>> > AMQSession.closed() is incorrect.  The MDL seems to be there to provide
>> JMS
>> > semantics around the interaction between message delivery and other
>> methods
>> > on the (JMS) session object.  As such it should only really be acquired
>> > from threads acting on behalf of the "application".  Application
>> initiated
>> > closes already hold the MDL by the time they get into AMQSession.closed()
>> > so I think this can safely be removed.  In the other cases of the MDL
>> being
>> > acquired after the failover mutex (such as BasicMessageConsumer.close())
>> I
>> > can see no reason why the lock acquisition order cannot be reversed (the
>> > locks are pretty much taken out one after another) and by reversing them
>> we
>> > can prevent deadlock with the Connection close().
>> >
>> > Making the above change *seems* safe to me and (for what little it is
>> > worth) all the system tests continue to pass without issue on both the
>> > 0-9-1 and 0-10 paths.
>> >
>> > Can anyone see any issues with the above?
>> >
>> > -- Rob
>> >
>> > On 9 February 2015 at 12:15, Robbie Gemmell <[email protected]>
>> > wrote:
>> >
>> >> Hi Rob,
>> >>
>> >> Its a couple of years since I last did any real work on the 0-10
>> >> client, but around that time I did review some proposed changes
>> >> attempting removal of the MDL enough to know they likely introduced
>> >> new issues. The main thing I recall from back then is it being hard to
>> >> reason about the impact of such changes and the interaction of the
>> >> various locks due to the way the client threads can enter the
>> >> application space (e.g onMessage, exception listeners) and then
>> >> re-enter the client code while holding an unclear set of locks.
>> >>
>> >> What you wrote certainly makes logical sense in that if you always
>> >> acquire the MDL before the FM then even that behaviour shouldnt then
>> >> be an issue in terms of deadlocking between those two. The main
>> >> weirdness I can think comes of in terms of the stacktrace given in
>> >> that if the dispatcher is inside onMessage holding the MDL it could
>> >> then block the connection IOReceiver thread from completing whatever
>> >> it is attempting, leaving it unable to service new bytes arriving from
>> >> the broker (e.g should anything onMessage proceeds to do actually
>> >> generate new requests, or there being other sessions) as well as doing
>> >> anything else it would normally do, until that sessions onMessage
>> >> exists, which may itself cause different problems. Looking again at
>> >> the the given stacktrace, it does sound reasonable for the lock order
>> >> in connection close to be reversed, and I also wonder if the MDL is
>> >> truely needed when indicating the session closed due to an execution
>> >> exception. My hands are currently full on other things though so these
>> >> aren't areas I am going to be able to look at personally.
>> >>
>> >> Robbie
>> >>
>> >> On 7 February 2015 at 14:23, Rob Godfrey <[email protected]>
>> wrote:
>> >> > Hi Rajith,
>> >> >
>> >> > so - I'm no expert on the locking model in the client... Logically I
>> >> would
>> >> > think that with the message delivery lock (MDL) being a session level
>> >> lock,
>> >> > and the failover mutex (FM) being connection wide, we should always
>> >> acquire
>> >> > the MDL before the FM.  Where we are currently acquiring the FM
>> without
>> >> > acquiring the MDL I would (probably naively) think that we should
>> first
>> >> be
>> >> > acquiring the MDL before the FM where the operation in question is
>> >> specific
>> >> > to a session (e.g. send a message, declare a queue, etc).  In the case
>> >> > where the operation is connection wide (which I think is pretty much
>> >> > limitted to connection closure) I would think that we would want to
>> get
>> >> the
>> >> > MDLs for all of the sessions on the connection before getting the FM.
>> >> >
>> >> > Looking at the code I couldn't immediately see why such a change would
>> >> > cause an issue - if something is performing an operation against a
>> >> session
>> >> > in onMessage mode then it would currently hold the MDL... and if we
>> are
>> >> > using receive() then holding the MDL is seemingly compatible with the
>> >> > definition of given by the javadoc.
>> >> >
>> >> > I expect I'm missing some subtlety here - can someone enlighten me?
>> >> >
>> >> > Thanks,
>> >> > Rob
>> >> >
>> >> > On 5 February 2015 at 15:16, Rajith Muditha Attapattu <
>> >> [email protected]>
>> >> > wrote:
>> >> >
>> >> >> The consensus is, that there is no easy fix, other than rewriting
>> some
>> >> key
>> >> >> pieces of the client.
>> >> >> Given that a new JMS client is being actively developed, we have
>> decided
>> >> >> not to devote any significant time/resources on the older client.
>> >> >>
>> >> >> The new JMS client is AMQP 1.0 and the source is here
>> >> >> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git
>> >> >> Robbie is the best person to give an accurate picture of the status
>> for
>> >> >> this project.
>> >> >>
>> >> >> Rajith
>> >> >>
>> >> >> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <[email protected]
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > To answer the easy question first, 0.32 clients should be totally
>> >> >> > compatible with an 0.16 broker.
>> >> >> >
>> >> >> > In terms of the deadlock issue, I know there are a number of open
>> >> >> deadlock
>> >> >> > JIRAs that Robbie and Rajith have looked at in the past...
>> Hopefully
>> >> one
>> >> >> of
>> >> >> > them will be able to chip in here and discuss the feasibility of
>> >> fixing
>> >> >> > your particular issue for 0.32.
>> >> >> >
>> >> >> > -- Rob
>> >> >> >
>> >> >> > On 4 February 2015 at 23:36, Helen Kwong <[email protected]>
>> >> wrote:
>> >> >> >
>> >> >> > > Hi Qpid gurus,
>> >> >> > >
>> >> >> > > We are using 0.16 Java broker and client on 0-10, and we are
>> running
>> >> >> into
>> >> >> > > deadlock issues on the client that involve AMQSession's
>> >> >> > > _messageDeliveryLock and AMQConnection's _failoverMutex, where
>> >> >> different
>> >> >> > > threads acquire them in different orders. This is leading to
>> major
>> >> >> > > production headaches for us and has us very worried.
>> >> >> > >
>> >> >> > > (I've been looking at 0.16 code mostly but also skimmed the
>> relevant
>> >> >> > parts
>> >> >> > > in 0.32, which seem largely the same in those places.)
>> >> >> > >
>> >> >> > > Deadlock Variety 1
>> >> >> > >
>> >> >> > > This is an example of a deadlock we see, where the IOReceiver
>> thread
>> >> >> > > deadlocks with Session dispatcher thread (we have listeners that
>> >> call
>> >> >> > > session rollback or commit in onMessage()):
>> >> >> > >
>> >> >> > > IOReceiver
>> >> >> > > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818)
>> >> >> > ---------->
>> >> >> > > waiting for session's messageDeliveryLock
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282)
>> >> >> > >  ----------> acquires connection's failoverMutex
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55)
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
>> >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159)
>> >> >> > > org.apache.qpid.transport.Session.received(Session.java:585)
>> >> >> > >
>> org.apache.qpid.transport.Connection.dispatch(Connection.java:412)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
>> >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159)
>> >> >> > >
>> org.apache.qpid.transport.Connection.received(Connection.java:367)
>> >> >> > > org.apache.qpid.transport.Connection.received(Connection.java:65)
>> >> >> > >
>> org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97)
>> >> >> > >
>> >> >>
>> org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198)
>> >> >> > >
>> >> org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131)
>> >> >> > > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
>> >> >> > >
>> >> >>
>> org.apache.qpid.transport.network.Assembler.received(Assembler.java:102)
>> >> >> > >
>> >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:44)
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44)
>> >> >> > >
>> >> >>
>> org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
>> >> >> > > java.lang.Thread.run(Thread.java:745)
>> >> >> > >
>> >> >> > > Session dispatcher thread
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255)
>> >> >> > >  ---------> waiting for connection's failoverMutex
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
>> >> >> > >
>> >> org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812)
>> >> >> > >
>> >> org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075)
>> >> >> > > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837)
>> >> >> > > common.messaging.QpidSession.rollback(QpidSession.java:211)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284)
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393)
>> >> >> > >   -----------> acquires session's messageDeliverylock
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180)
>> >> >> > > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173)
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
>> >> >> > >
>> >> org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316)
>> >> >> > > java.lang.Thread.run(Thread.java:745)
>> >> >> > >
>> >> >> > > The problem is that the IOReceiver thread acquires failoverMutex
>> >> before
>> >> >> > > messageDeliveryLock (for each session), whereas the dispatcher
>> >> thread
>> >> >> > > acquires it in the other order. We also see potential problems
>> where
>> >> >> > other
>> >> >> > > threads (instead of IOReceiver) can deadlock with the dispatcher
>> >> >> thread,
>> >> >> > as
>> >> >> > > long as it acquires failoverMutex before messageDeliveryLock.
>> >> Examples
>> >> >> we
>> >> >> > > can think of:
>> >> >> > >
>> >> >> > > A) Another thread calling AMQSession.close()
>> >> >> > > B) Another thread calling BasicMessageConsumer.close()
>> >> >> > > C) Same connection, different session's dispatcher thread,
>> calling
>> >> >> > > rollback() or commit() -> sync() -> setCurrentException() ->
>> >> >> > > AMQConnection.exceptionReceived() ->
>> >> AMQConnection.closeAllSessions(),
>> >> >> > > which can try to acquire the messageDeliveryLock of another
>> session
>> >> and
>> >> >> > > deadlock with the other session's dispatcher thread
>> >> >> > >
>> >> >> > >
>> >> >> > > Deadlock Variety 2:
>> >> >> > > From code inspection, it also appears that AMQConnection.close()
>> can
>> >> >> > > deadlock with either AMQSession.close() or
>> >> BasicMessageConsumer.close()
>> >> >> > > (where the session / consumer is on the same connection). This is
>> >> >> because
>> >> >> > > AMQConnection.close() first acquires the messageDeliveryLock of
>> all
>> >> its
>> >> >> > > sessions in the recursive doClose(), before trying to acquire the
>> >> >> > > connection's failoverMutex. But the Session / consumer's close()
>> >> >> acquires
>> >> >> > > the failoverMutex before messageDeliveryLock. We haven't seen
>> this
>> >> >> happen
>> >> >> > > but would like to know if this is possible.
>> >> >> > >
>> >> >> > >
>> >> >> > > We'd really appreciate your help on this. Assuming these can be
>> >> fixed
>> >> >> in
>> >> >> > > 0.32, we are also wondering if clients are backward compatible --
>> >> i.e.,
>> >> >> > can
>> >> >> > > we upgrade only our client to 0.32 while continuing to use the
>> 0.16
>> >> >> > broker?
>> >> >> > >
>> >> >> > > Thanks,
>> >> >> > > Helen
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to