Sounds reasonable. Robbie
On 9 February 2015 at 13:40, Rob Godfrey <[email protected]> wrote: > So looking back to when the MDL acquisition in closed() was first > introduced... it seems to have been for > https://issues.apache.org/jira/browse/QPID-547 which I think really should > only have covered the close() case (i.e. application initiated closure) > rather than the server initiated closure. > > As such I'm happy removing the MDL acquisition there. > > -- Rob > > On 9 February 2015 at 13:26, Robbie Gemmell <[email protected]> > wrote: > >> Taking a quick peek it as you said *seems* safe, since the 'closed' >> call looks like the only place a non-app non-dispatcher thread could >> get the MDL. I would probably need a long time to really convince >> myself though. >> >> Robbie >> >> On 9 February 2015 at 11:47, Rob Godfrey <[email protected]> wrote: >> > Hi Robbie, >> > >> > yes - my thinking is currently that the acquiring of the MDL in >> > AMQSession.closed() is incorrect. The MDL seems to be there to provide >> JMS >> > semantics around the interaction between message delivery and other >> methods >> > on the (JMS) session object. As such it should only really be acquired >> > from threads acting on behalf of the "application". Application >> initiated >> > closes already hold the MDL by the time they get into AMQSession.closed() >> > so I think this can safely be removed. In the other cases of the MDL >> being >> > acquired after the failover mutex (such as BasicMessageConsumer.close()) >> I >> > can see no reason why the lock acquisition order cannot be reversed (the >> > locks are pretty much taken out one after another) and by reversing them >> we >> > can prevent deadlock with the Connection close(). >> > >> > Making the above change *seems* safe to me and (for what little it is >> > worth) all the system tests continue to pass without issue on both the >> > 0-9-1 and 0-10 paths. >> > >> > Can anyone see any issues with the above? >> > >> > -- Rob >> > >> > On 9 February 2015 at 12:15, Robbie Gemmell <[email protected]> >> > wrote: >> > >> >> Hi Rob, >> >> >> >> Its a couple of years since I last did any real work on the 0-10 >> >> client, but around that time I did review some proposed changes >> >> attempting removal of the MDL enough to know they likely introduced >> >> new issues. The main thing I recall from back then is it being hard to >> >> reason about the impact of such changes and the interaction of the >> >> various locks due to the way the client threads can enter the >> >> application space (e.g onMessage, exception listeners) and then >> >> re-enter the client code while holding an unclear set of locks. >> >> >> >> What you wrote certainly makes logical sense in that if you always >> >> acquire the MDL before the FM then even that behaviour shouldnt then >> >> be an issue in terms of deadlocking between those two. The main >> >> weirdness I can think comes of in terms of the stacktrace given in >> >> that if the dispatcher is inside onMessage holding the MDL it could >> >> then block the connection IOReceiver thread from completing whatever >> >> it is attempting, leaving it unable to service new bytes arriving from >> >> the broker (e.g should anything onMessage proceeds to do actually >> >> generate new requests, or there being other sessions) as well as doing >> >> anything else it would normally do, until that sessions onMessage >> >> exists, which may itself cause different problems. Looking again at >> >> the the given stacktrace, it does sound reasonable for the lock order >> >> in connection close to be reversed, and I also wonder if the MDL is >> >> truely needed when indicating the session closed due to an execution >> >> exception. My hands are currently full on other things though so these >> >> aren't areas I am going to be able to look at personally. >> >> >> >> Robbie >> >> >> >> On 7 February 2015 at 14:23, Rob Godfrey <[email protected]> >> wrote: >> >> > Hi Rajith, >> >> > >> >> > so - I'm no expert on the locking model in the client... Logically I >> >> would >> >> > think that with the message delivery lock (MDL) being a session level >> >> lock, >> >> > and the failover mutex (FM) being connection wide, we should always >> >> acquire >> >> > the MDL before the FM. Where we are currently acquiring the FM >> without >> >> > acquiring the MDL I would (probably naively) think that we should >> first >> >> be >> >> > acquiring the MDL before the FM where the operation in question is >> >> specific >> >> > to a session (e.g. send a message, declare a queue, etc). In the case >> >> > where the operation is connection wide (which I think is pretty much >> >> > limitted to connection closure) I would think that we would want to >> get >> >> the >> >> > MDLs for all of the sessions on the connection before getting the FM. >> >> > >> >> > Looking at the code I couldn't immediately see why such a change would >> >> > cause an issue - if something is performing an operation against a >> >> session >> >> > in onMessage mode then it would currently hold the MDL... and if we >> are >> >> > using receive() then holding the MDL is seemingly compatible with the >> >> > definition of given by the javadoc. >> >> > >> >> > I expect I'm missing some subtlety here - can someone enlighten me? >> >> > >> >> > Thanks, >> >> > Rob >> >> > >> >> > On 5 February 2015 at 15:16, Rajith Muditha Attapattu < >> >> [email protected]> >> >> > wrote: >> >> > >> >> >> The consensus is, that there is no easy fix, other than rewriting >> some >> >> key >> >> >> pieces of the client. >> >> >> Given that a new JMS client is being actively developed, we have >> decided >> >> >> not to devote any significant time/resources on the older client. >> >> >> >> >> >> The new JMS client is AMQP 1.0 and the source is here >> >> >> https://git-wip-us.apache.org/repos/asf?p=qpid-jms.git >> >> >> Robbie is the best person to give an accurate picture of the status >> for >> >> >> this project. >> >> >> >> >> >> Rajith >> >> >> >> >> >> On Thu, Feb 5, 2015 at 2:42 AM, Rob Godfrey <[email protected] >> > >> >> >> wrote: >> >> >> >> >> >> > To answer the easy question first, 0.32 clients should be totally >> >> >> > compatible with an 0.16 broker. >> >> >> > >> >> >> > In terms of the deadlock issue, I know there are a number of open >> >> >> deadlock >> >> >> > JIRAs that Robbie and Rajith have looked at in the past... >> Hopefully >> >> one >> >> >> of >> >> >> > them will be able to chip in here and discuss the feasibility of >> >> fixing >> >> >> > your particular issue for 0.32. >> >> >> > >> >> >> > -- Rob >> >> >> > >> >> >> > On 4 February 2015 at 23:36, Helen Kwong <[email protected]> >> >> wrote: >> >> >> > >> >> >> > > Hi Qpid gurus, >> >> >> > > >> >> >> > > We are using 0.16 Java broker and client on 0-10, and we are >> running >> >> >> into >> >> >> > > deadlock issues on the client that involve AMQSession's >> >> >> > > _messageDeliveryLock and AMQConnection's _failoverMutex, where >> >> >> different >> >> >> > > threads acquire them in different orders. This is leading to >> major >> >> >> > > production headaches for us and has us very worried. >> >> >> > > >> >> >> > > (I've been looking at 0.16 code mostly but also skimmed the >> relevant >> >> >> > parts >> >> >> > > in 0.32, which seem largely the same in those places.) >> >> >> > > >> >> >> > > Deadlock Variety 1 >> >> >> > > >> >> >> > > This is an example of a deadlock we see, where the IOReceiver >> thread >> >> >> > > deadlocks with Session dispatcher thread (we have listeners that >> >> call >> >> >> > > session rollback or commit in onMessage()): >> >> >> > > >> >> >> > > IOReceiver >> >> >> > > org.apache.qpid.client.AMQSession.closed(AMQSession.java:818) >> >> >> > ----------> >> >> >> > > waiting for session's messageDeliveryLock >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:938) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1282) >> >> >> > > ----------> acquires connection's failoverMutex >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057) >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:907) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103) >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55) >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) >> >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159) >> >> >> > > org.apache.qpid.transport.Session.received(Session.java:585) >> >> >> > > >> org.apache.qpid.transport.Connection.dispatch(Connection.java:412) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) >> >> >> > > org.apache.qpid.transport.Method.delegate(Method.java:159) >> >> >> > > >> org.apache.qpid.transport.Connection.received(Connection.java:367) >> >> >> > > org.apache.qpid.transport.Connection.received(Connection.java:65) >> >> >> > > >> org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97) >> >> >> > > >> >> >> >> org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198) >> >> >> > > >> >> org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131) >> >> >> > > org.apache.qpid.transport.network.Frame.delegate(Frame.java:128) >> >> >> > > >> >> >> >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:102) >> >> >> > > >> >> org.apache.qpid.transport.network.Assembler.received(Assembler.java:44) >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44) >> >> >> > > >> >> >> >> org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152) >> >> >> > > java.lang.Thread.run(Thread.java:745) >> >> >> > > >> >> >> > > Session dispatcher thread >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255) >> >> >> > > ---------> waiting for connection's failoverMutex >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057) >> >> >> > > >> >> org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:812) >> >> >> > > >> >> org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075) >> >> >> > > org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1837) >> >> >> > > common.messaging.QpidSession.rollback(QpidSession.java:211) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> common.messaging.QpidMessageHandler.rollbackSession(QpidMessageHandler.java:284) >> >> >> > > >> >> >> > >> >> >> >> >> >> common.messaging.QpidMessageHandler.onMessage(QpidMessageHandler.java:113) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:748) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:722) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3454) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3393) >> >> >> > > -----------> acquires session's messageDeliverylock >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180) >> >> >> > > org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173) >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54) >> >> >> > > >> >> org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3316) >> >> >> > > java.lang.Thread.run(Thread.java:745) >> >> >> > > >> >> >> > > The problem is that the IOReceiver thread acquires failoverMutex >> >> before >> >> >> > > messageDeliveryLock (for each session), whereas the dispatcher >> >> thread >> >> >> > > acquires it in the other order. We also see potential problems >> where >> >> >> > other >> >> >> > > threads (instead of IOReceiver) can deadlock with the dispatcher >> >> >> thread, >> >> >> > as >> >> >> > > long as it acquires failoverMutex before messageDeliveryLock. >> >> Examples >> >> >> we >> >> >> > > can think of: >> >> >> > > >> >> >> > > A) Another thread calling AMQSession.close() >> >> >> > > B) Another thread calling BasicMessageConsumer.close() >> >> >> > > C) Same connection, different session's dispatcher thread, >> calling >> >> >> > > rollback() or commit() -> sync() -> setCurrentException() -> >> >> >> > > AMQConnection.exceptionReceived() -> >> >> AMQConnection.closeAllSessions(), >> >> >> > > which can try to acquire the messageDeliveryLock of another >> session >> >> and >> >> >> > > deadlock with the other session's dispatcher thread >> >> >> > > >> >> >> > > >> >> >> > > Deadlock Variety 2: >> >> >> > > From code inspection, it also appears that AMQConnection.close() >> can >> >> >> > > deadlock with either AMQSession.close() or >> >> BasicMessageConsumer.close() >> >> >> > > (where the session / consumer is on the same connection). This is >> >> >> because >> >> >> > > AMQConnection.close() first acquires the messageDeliveryLock of >> all >> >> its >> >> >> > > sessions in the recursive doClose(), before trying to acquire the >> >> >> > > connection's failoverMutex. But the Session / consumer's close() >> >> >> acquires >> >> >> > > the failoverMutex before messageDeliveryLock. We haven't seen >> this >> >> >> happen >> >> >> > > but would like to know if this is possible. >> >> >> > > >> >> >> > > >> >> >> > > We'd really appreciate your help on this. Assuming these can be >> >> fixed >> >> >> in >> >> >> > > 0.32, we are also wondering if clients are backward compatible -- >> >> i.e., >> >> >> > can >> >> >> > > we upgrade only our client to 0.32 while continuing to use the >> 0.16 >> >> >> > broker? >> >> >> > > >> >> >> > > Thanks, >> >> >> > > Helen >> >> >> > > >> >> >> > >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
