So looking back to when the MDL acquisition in closed() was first introduced... it seems to have been for https://issues.apache.org/jira/browse/QPID-547 which I think really should only have covered the close() case (i.e. application initiated closure) rather than the server initiated closure.
As such I'm happy removing the MDL acquisition there. -- Rob On 9 February 2015 at 13:26, Robbie Gemmell <[email protected]> wrote: > Taking a quick peek it as you said *seems* safe, since the 'closed' > call looks like the only place a non-app non-dispatcher thread could > get the MDL. I would probably need a long time to really convince > myself though. > > Robbie > > On 9 February 2015 at 11:47, Rob Godfrey <[email protected]> wrote: > > Hi Robbie, > > > > yes - my thinking is currently that the acquiring of the MDL in > > AMQSession.closed() is incorrect. The MDL seems to be there to provide > JMS > > semantics around the interaction between message delivery and other > methods > > on the (JMS) session object. As such it should only really be acquired > > from threads acting on behalf of the "application". Application > initiated > > closes already hold the MDL by the time they get into AMQSession.closed() > > so I think this can safely be removed. In the other cases of the MDL > being > > acquired after the failover mutex (such as BasicMessageConsumer.close()) > I > > can see no reason why the lock acquisition order cannot be reversed (the > > locks are pretty much taken out one after another) and by reversing them > we > > can prevent deadlock with the Connection close(). > > > > Making the above change *seems* safe to me and (for what little it is > > worth) all the system tests continue to pass without issue on both the > > 0-9-1 and 0-10 paths. > > > > Can anyone see any issues with the above? > > > > -- Rob > > > > On 9 February 2015 at 12:15, Robbie Gemmell <[email protected]> > > wrote: > > > >> Hi Rob, > >> > >> Its a couple of years since I last did any real work on the 0-10 > >> client, but around that time I did review some proposed changes > >> attempting removal of the MDL enough to know they likely introduced > >> new issues. The main thing I recall from back then is it being hard to > >> reason about the impact of such changes and the interaction of the > >> various locks due to the way the client threads can enter the > >> application space (e.g onMessage, exception listeners) and then > >> re-enter the client code while holding an unclear set of locks. > >> > >> What you wrote certainly makes logical sense in that if you always > >> acquire the MDL before the FM then even that behaviour shouldnt then > >> be an issue in terms of deadlocking between those two. The main > >> weirdness I can think comes of in terms of the stacktrace given in > >> that if the dispatcher is inside onMessage holding the MDL it could > >> then block the connection IOReceiver thread from completing whatever > >> it is attempting, leaving it unable to service new bytes arriving from > >> the broker (e.g should anything onMessage proceeds to do actually > >> generate new requests, or there being other sessions) as well as doing > >> anything else it would normally do, until that sessions onMessage > >> exists, which may itself cause different problems. Looking again at > >> the the given stacktrace, it does sound reasonable for the lock order > >> in connection close to be reversed, and I also wonder if the MDL is > >> truely needed when indicating the session closed due to an execution > >> exception. My hands are currently full on other things though so these > >> aren't areas I am going to be able to look at personally. > >> > >> Robbie > >> > >> On 7 February 2015 at 14:23, Rob Godfrey <[email protected]> > wrote: > >> > Hi Rajith, > >> > > >> > so - I'm no expert on the locking model in the client... Logically I > >> would > >> > think that with the message delivery lock (MDL) being a session level > >> lock, > >> > and the failover mutex (FM) being connection wide, we should always > >> acquire > >> > the MDL before the FM. Where we are currently acquiring the FM > without > >> > acquiring the MDL I would (probably naively) think that we should > first > >> be > >> > acquiring the MDL before the FM where the operation in question is > >> specific > >> > to a session (e.g. send a message, declare a queue, etc). In the case > >> > where the operation is connection wide (which I think is pretty much > >> > limitted to connection closure) I would think that we would want to > get > >> the > >> > MDLs for all of the sessions on the connection before getting the FM. > >> > > >> > Looking at the code I couldn't immediately see why such a change would > >> > cause an issue - if something is performing an operation against a > >> session > >> > in onMessage mode then it would currently hold the MDL... and if we > are > >> > using receive() then holding the MDL is seemingly compatible with the > >> > definition of given by the javadoc. > >> > > >> > I expect I'm missing some subtlety here - can someone enlighten me? > >> > > >> > Thanks, > >> > Rob > >> > > >> > On 5 February 2015 at 15:16, Rajith Muditha Attapattu < > >> [email protected]> > >> > wrote: > >> > > >> >> The consensus is, that there is no easy fix, other than rewriting > some > >> key > >> >> pieces of the client. > >> >> Given that a new JMS client is being actively developed, we have > decided > >> >> not to devote any significant time/resources on the older client. > >> >> > >> >> The new JMS client is AMQP 1.0 and the source is here > >> >> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git > >> >> Robbie is the best person to give an accurate picture of the status > for > >> >> this project. > >> >> > >> >> Rajith > >> >> > >> >> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <[email protected] > > > >> >> wrote: > >> >> > >> >> > To answer the easy question first, 0.32 clients should be totally > >> >> > compatible with an 0.16 broker. > >> >> > > >> >> > In terms of the deadlock issue, I know there are a number of open > >> >> deadlock > >> >> > JIRAs that Robbie and Rajith have looked at in the past... > Hopefully > >> one > >> >> of > >> >> > them will be able to chip in here and discuss the feasibility of > >> fixing > >> >> > your particular issue for 0.32. > >> >> > > >> >> > -- Rob > >> >> > > >> >> > On 4 February 2015 at 23:36, Helen Kwong <[email protected]> > >> wrote: > >> >> > > >> >> > > Hi Qpid gurus, > >> >> > > > >> >> > > We are using 0.16 Java broker and client on 0-10, and we are > running > >> >> into > >> >> > > deadlock issues on the client that involve AMQSession's > >> >> > > _messageDeliveryLock and AMQConnection's _failoverMutex, where > >> >> different > >> >> > > threads acquire them in different orders. This is leading to > major > >> >> > > production headaches for us and has us very worried. > >> >> > > > >> >> > > (I've been looking at 0.16 code mostly but also skimmed the > relevant > >> >> > parts > >> >> > > in 0.32, which seem largely the same in those places.) > >> >> > > > >> >> > > Deadlock Variety 1 > >> >> > > > >> >> > > This is an example of a deadlock we see, where the IOReceiver > thread > >> >> > > deadlocks with Session dispatcher thread (we have listeners that > >> call > >> >> > > session rollback or commit in onMessage()): > >> >> > > > >> >> > > IOReceiver > >> >> > > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818) > >> >> > ----------> > >> >> > > waiting for session's messageDeliveryLock > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282) > >> >> > > ----------> acquires connection's failoverMutex > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057) > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103) > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55) > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) > >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159) > >> >> > > org.apache.qpid.transport.Session.received(Session.java:585) > >> >> > > > org.apache.qpid.transport.Connection.dispatch(Connection.java:412) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) > >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159) > >> >> > > > org.apache.qpid.transport.Connection.received(Connection.java:367) > >> >> > > org.apache.qpid.transport.Connection.received(Connection.java:65) > >> >> > > > org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97) > >> >> > > > >> >> > org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198) > >> >> > > > >> org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131) > >> >> > > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128) > >> >> > > > >> >> > org.apache.qpid.transport.network.Assembler.received(Assembler.java:102) > >> >> > > > >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:44) > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44) > >> >> > > > >> >> > org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152) > >> >> > > java.lang.Thread.run(Thread.java:745) > >> >> > > > >> >> > > Session dispatcher thread > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255) > >> >> > > ---------> waiting for connection's failoverMutex > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057) > >> >> > > > >> org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812) > >> >> > > > >> org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075) > >> >> > > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837) > >> >> > > common.messaging.QpidSession.rollback(QpidSession.java:211) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284) > >> >> > > > >> >> > > >> >> > >> > common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393) > >> >> > > -----------> acquires session's messageDeliverylock > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180) > >> >> > > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173) > >> >> > > > >> >> > > > >> >> > > >> >> > >> > org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54) > >> >> > > > >> org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316) > >> >> > > java.lang.Thread.run(Thread.java:745) > >> >> > > > >> >> > > The problem is that the IOReceiver thread acquires failoverMutex > >> before > >> >> > > messageDeliveryLock (for each session), whereas the dispatcher > >> thread > >> >> > > acquires it in the other order. We also see potential problems > where > >> >> > other > >> >> > > threads (instead of IOReceiver) can deadlock with the dispatcher > >> >> thread, > >> >> > as > >> >> > > long as it acquires failoverMutex before messageDeliveryLock. > >> Examples > >> >> we > >> >> > > can think of: > >> >> > > > >> >> > > A) Another thread calling AMQSession.close() > >> >> > > B) Another thread calling BasicMessageConsumer.close() > >> >> > > C) Same connection, different session's dispatcher thread, > calling > >> >> > > rollback() or commit() -> sync() -> setCurrentException() -> > >> >> > > AMQConnection.exceptionReceived() -> > >> AMQConnection.closeAllSessions(), > >> >> > > which can try to acquire the messageDeliveryLock of another > session > >> and > >> >> > > deadlock with the other session's dispatcher thread > >> >> > > > >> >> > > > >> >> > > Deadlock Variety 2: > >> >> > > From code inspection, it also appears that AMQConnection.close() > can > >> >> > > deadlock with either AMQSession.close() or > >> BasicMessageConsumer.close() > >> >> > > (where the session / consumer is on the same connection). This is > >> >> because > >> >> > > AMQConnection.close() first acquires the messageDeliveryLock of > all > >> its > >> >> > > sessions in the recursive doClose(), before trying to acquire the > >> >> > > connection's failoverMutex. But the Session / consumer's close() > >> >> acquires > >> >> > > the failoverMutex before messageDeliveryLock. We haven't seen > this > >> >> happen > >> >> > > but would like to know if this is possible. > >> >> > > > >> >> > > > >> >> > > We'd really appreciate your help on this. Assuming these can be > >> fixed > >> >> in > >> >> > > 0.32, we are also wondering if clients are backward compatible -- > >> i.e., > >> >> > can > >> >> > > we upgrade only our client to 0.32 while continuing to use the > 0.16 > >> >> > broker? > >> >> > > > >> >> > > Thanks, > >> >> > > Helen > >> >> > > > >> >> > > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
