Taking a quick peek it as you said *seems* safe, since the 'closed' call looks like the only place a non-app non-dispatcher thread could get the MDL. I would probably need a long time to really convince myself though.
Robbie On 9 February 2015 at 11:47, Rob Godfrey <[email protected]> wrote: > Hi Robbie, > > yes - my thinking is currently that the acquiring of the MDL in > AMQSession.closed() is incorrect. The MDL seems to be there to provide JMS > semantics around the interaction between message delivery and other methods > on the (JMS) session object. As such it should only really be acquired > from threads acting on behalf of the "application". Application initiated > closes already hold the MDL by the time they get into AMQSession.closed() > so I think this can safely be removed. In the other cases of the MDL being > acquired after the failover mutex (such as BasicMessageConsumer.close()) I > can see no reason why the lock acquisition order cannot be reversed (the > locks are pretty much taken out one after another) and by reversing them we > can prevent deadlock with the Connection close(). > > Making the above change *seems* safe to me and (for what little it is > worth) all the system tests continue to pass without issue on both the > 0-9-1 and 0-10 paths. > > Can anyone see any issues with the above? > > -- Rob > > On 9 February 2015 at 12:15, Robbie Gemmell <[email protected]> > wrote: > >> Hi Rob, >> >> Its a couple of years since I last did any real work on the 0-10 >> client, but around that time I did review some proposed changes >> attempting removal of the MDL enough to know they likely introduced >> new issues. The main thing I recall from back then is it being hard to >> reason about the impact of such changes and the interaction of the >> various locks due to the way the client threads can enter the >> application space (e.g onMessage, exception listeners) and then >> re-enter the client code while holding an unclear set of locks. >> >> What you wrote certainly makes logical sense in that if you always >> acquire the MDL before the FM then even that behaviour shouldnt then >> be an issue in terms of deadlocking between those two. The main >> weirdness I can think comes of in terms of the stacktrace given in >> that if the dispatcher is inside onMessage holding the MDL it could >> then block the connection IOReceiver thread from completing whatever >> it is attempting, leaving it unable to service new bytes arriving from >> the broker (e.g should anything onMessage proceeds to do actually >> generate new requests, or there being other sessions) as well as doing >> anything else it would normally do, until that sessions onMessage >> exists, which may itself cause different problems. Looking again at >> the the given stacktrace, it does sound reasonable for the lock order >> in connection close to be reversed, and I also wonder if the MDL is >> truely needed when indicating the session closed due to an execution >> exception. My hands are currently full on other things though so these >> aren't areas I am going to be able to look at personally. >> >> Robbie >> >> On 7 February 2015 at 14:23, Rob Godfrey <[email protected]> wrote: >> > Hi Rajith, >> > >> > so - I'm no expert on the locking model in the client... Logically I >> would >> > think that with the message delivery lock (MDL) being a session level >> lock, >> > and the failover mutex (FM) being connection wide, we should always >> acquire >> > the MDL before the FM. Where we are currently acquiring the FM without >> > acquiring the MDL I would (probably naively) think that we should first >> be >> > acquiring the MDL before the FM where the operation in question is >> specific >> > to a session (e.g. send a message, declare a queue, etc). In the case >> > where the operation is connection wide (which I think is pretty much >> > limitted to connection closure) I would think that we would want to get >> the >> > MDLs for all of the sessions on the connection before getting the FM. >> > >> > Looking at the code I couldn't immediately see why such a change would >> > cause an issue - if something is performing an operation against a >> session >> > in onMessage mode then it would currently hold the MDL... and if we are >> > using receive() then holding the MDL is seemingly compatible with the >> > definition of given by the javadoc. >> > >> > I expect I'm missing some subtlety here - can someone enlighten me? >> > >> > Thanks, >> > Rob >> > >> > On 5 February 2015 at 15:16, Rajith Muditha Attapattu < >> [email protected]> >> > wrote: >> > >> >> The consensus is, that there is no easy fix, other than rewriting some >> key >> >> pieces of the client. >> >> Given that a new JMS client is being actively developed, we have decided >> >> not to devote any significant time/resources on the older client. >> >> >> >> The new JMS client is AMQP 1.0 and the source is here >> >> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git >> >> Robbie is the best person to give an accurate picture of the status for >> >> this project. >> >> >> >> Rajith >> >> >> >> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <[email protected]> >> >> wrote: >> >> >> >> > To answer the easy question first, 0.32 clients should be totally >> >> > compatible with an 0.16 broker. >> >> > >> >> > In terms of the deadlock issue, I know there are a number of open >> >> deadlock >> >> > JIRAs that Robbie and Rajith have looked at in the past... Hopefully >> one >> >> of >> >> > them will be able to chip in here and discuss the feasibility of >> fixing >> >> > your particular issue for 0.32. >> >> > >> >> > -- Rob >> >> > >> >> > On 4 February 2015 at 23:36, Helen Kwong <[email protected]> >> wrote: >> >> > >> >> > > Hi Qpid gurus, >> >> > > >> >> > > We are using 0.16 Java broker and client on 0-10, and we are running >> >> into >> >> > > deadlock issues on the client that involve AMQSession's >> >> > > _messageDeliveryLock and AMQConnection's _failoverMutex, where >> >> different >> >> > > threads acquire them in different orders. This is leading to major >> >> > > production headaches for us and has us very worried. >> >> > > >> >> > > (I've been looking at 0.16 code mostly but also skimmed the relevant >> >> > parts >> >> > > in 0.32, which seem largely the same in those places.) >> >> > > >> >> > > Deadlock Variety 1 >> >> > > >> >> > > This is an example of a deadlock we see, where the IOReceiver thread >> >> > > deadlocks with Session dispatcher thread (we have listeners that >> call >> >> > > session rollback or commit in onMessage()): >> >> > > >> >> > > IOReceiver >> >> > > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818) >> >> > ----------> >> >> > > waiting for session's messageDeliveryLock >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282) >> >> > > ----------> acquires connection's failoverMutex >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057) >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103) >> >> > > >> >> > >> >> >> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55) >> >> > > >> >> > >> >> >> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) >> >> > > >> >> > >> >> >> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159) >> >> > > org.apache.qpid.transport.Session.received(Session.java:585) >> >> > > org.apache.qpid.transport.Connection.dispatch(Connection.java:412) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159) >> >> > > org.apache.qpid.transport.Connection.received(Connection.java:367) >> >> > > org.apache.qpid.transport.Connection.received(Connection.java:65) >> >> > > org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97) >> >> > > >> >> org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198) >> >> > > >> org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131) >> >> > > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128) >> >> > > >> >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:102) >> >> > > >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:44) >> >> > > >> >> > >> >> >> org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44) >> >> > > >> >> org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152) >> >> > > java.lang.Thread.run(Thread.java:745) >> >> > > >> >> > > Session dispatcher thread >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255) >> >> > > ---------> waiting for connection's failoverMutex >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057) >> >> > > >> org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812) >> >> > > >> org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075) >> >> > > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837) >> >> > > common.messaging.QpidSession.rollback(QpidSession.java:211) >> >> > > >> >> > > >> >> > >> >> >> common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284) >> >> > > >> >> > >> >> >> common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393) >> >> > > -----------> acquires session's messageDeliverylock >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180) >> >> > > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173) >> >> > > >> >> > > >> >> > >> >> >> org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54) >> >> > > >> org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316) >> >> > > java.lang.Thread.run(Thread.java:745) >> >> > > >> >> > > The problem is that the IOReceiver thread acquires failoverMutex >> before >> >> > > messageDeliveryLock (for each session), whereas the dispatcher >> thread >> >> > > acquires it in the other order. We also see potential problems where >> >> > other >> >> > > threads (instead of IOReceiver) can deadlock with the dispatcher >> >> thread, >> >> > as >> >> > > long as it acquires failoverMutex before messageDeliveryLock. >> Examples >> >> we >> >> > > can think of: >> >> > > >> >> > > A) Another thread calling AMQSession.close() >> >> > > B) Another thread calling BasicMessageConsumer.close() >> >> > > C) Same connection, different session's dispatcher thread, calling >> >> > > rollback() or commit() -> sync() -> setCurrentException() -> >> >> > > AMQConnection.exceptionReceived() -> >> AMQConnection.closeAllSessions(), >> >> > > which can try to acquire the messageDeliveryLock of another session >> and >> >> > > deadlock with the other session's dispatcher thread >> >> > > >> >> > > >> >> > > Deadlock Variety 2: >> >> > > From code inspection, it also appears that AMQConnection.close() can >> >> > > deadlock with either AMQSession.close() or >> BasicMessageConsumer.close() >> >> > > (where the session / consumer is on the same connection). This is >> >> because >> >> > > AMQConnection.close() first acquires the messageDeliveryLock of all >> its >> >> > > sessions in the recursive doClose(), before trying to acquire the >> >> > > connection's failoverMutex. But the Session / consumer's close() >> >> acquires >> >> > > the failoverMutex before messageDeliveryLock. We haven't seen this >> >> happen >> >> > > but would like to know if this is possible. >> >> > > >> >> > > >> >> > > We'd really appreciate your help on this. Assuming these can be >> fixed >> >> in >> >> > > 0.32, we are also wondering if clients are backward compatible -- >> i.e., >> >> > can >> >> > > we upgrade only our client to 0.32 while continuing to use the 0.16 >> >> > broker? >> >> > > >> >> > > Thanks, >> >> > > Helen >> >> > > >> >> > >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
