Rupert - which branch of the code are you looking at? -- Rob
On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]> wrote: > > Sorry, I'm talking rubbish. The ordering of the locks in AMQSession.closeis > fine, and as it should be. Message delivery, followed by failover mutex. > > The problematic interaction is because the call to AMQConnection.close, > obtains the failover mutex first. I'd be tempted to try and eliminate that > one, so that as each session is closed delivery then failover is > obtained/released for each. > > You could remove the failover mutex from createTextMessage, but other > methods that do need to obtain it can be called from within a dispathcher > thread, so the problem can still occur. > > Rupert > > On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]> wrote: > > > > Also messageDeliveryLock seems like a poor choice of name. > > 'sessionCloseMutex' might be better. > > > > On 17/09/2007, Rupert Smith < [EMAIL PROTECTED]> wrote: > > > > > > Actually, I see it now, the message delivery lock is obtained in the > > > dispatcher run method. > > > > > > So createTextMessage is doing: > > > > > > 1. Obtain message delivery lock. > > > 2. Obtain failover mutex. > > > > > > And the main thread calling close is doing: > > > > > > 1. Obtain failover mutex. > > > 2. Obtain message delivery lock. > > > > > > Just swap the lock ordering as I suggested. Unless this conflicts with > > > other orderings... > > > > > > On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]> wrote: > > > > > > > > It looks to me that the message delivery lock is the more > fundamental > > > > cause of the dead lock. I can't figure out from the stack trace why > the > > > > dispatcher thread is holding that one (I suppose I should look at > what the > > > > test is doing to figure that out). > > > > > > > > Can we swap the order of obtaining the locks in AMQSession.close > (long > > > > timeout) from > > > > > > > > synchronized(_messageDeliveryLock) > > > > synchronized (_connection.getFailoverMutex()) > > > > > > > > to > > > > > > > > synchronized (_connection.getFailoverMutex()) > > > > synchronized(_messageDeliveryLock) > > > > > > > > ? > > > > > > > > Unless that causes a different dead lock, it looks like it will fix > > > > the problem. > > > > > > > > Also, it looks to me that removing the failover mutex acquisition > from > > > > createTextMessage will be harmless enough. > > > > > > > > Rupert > > > > > > > > > > > > On 15/09/2007, Rajith Attapattu < [EMAIL PROTECTED]> wrote: > > > > > > > > > > Hi All, > > > > > > > > > > When I run the latest version of trunk I get a hang on > > > > > org.apache.qpid.test.unit.client.forwardall.CombinedTest > > > > > kill -3 (thread dump) provides information to say there is a > > > > > deadlock. > > > > > > > > > > I also noted that all AMQSession.createXXXMessage contains > > > > > synchronization > > > > > code. > > > > > (in this case the deadlock happens inn > AMQSession.createTextMessagetrying > > > > > to obtain the failover mutex) > > > > > > > > > > Do we really need this synchronization code when creating > messages? > > > > > I am not > > > > > really convinced it is needed. > > > > > Can someody explain the reason for that code block? > > > > > > > > > > Regard, > > > > > > > > > > Rajith > > > > > > > > > > I have attached the trace below. > > > > > > > > > > Found one Java-level deadlock: > > > > > ============================= > > > > > "Dispatcher-Channel-1": > > > > > waiting to lock monitor 0x09ff950c (object 0x8b85a1a0, a > > > > > java.lang.Object > > > > > ), > > > > > which is held by "main" > > > > > "main": > > > > > waiting to lock monitor 0x09ff964c (object 0x88ff0240, a > > > > > java.lang.Object > > > > > ), > > > > > which is held by "Dispatcher-Channel-1" > > > > > > > > > > Java stack information for the threads listed above: > > > > > =================================================== > > > > > "Dispatcher-Channel-1": > > > > > at org.apache.qpid.client.AMQSession.createTextMessage( > > > > > AMQSession.java:1057) > > > > > - waiting to lock <0x8b85a1a0> (a java.lang.Object) > > > > > at org.apache.qpid.client.AMQSession.createTextMessage ( > > > > > AMQSession.java:1068) > > > > > at > > > > > org.apache.qpid.test.unit.client.forwardall.Service.onMessage( > > > > > Service.java:59) > > > > > at > org.apache.qpid.client.BasicMessageConsumer.notifyMessage > > > > > ( > > > > > BasicMessageConsumer.java :628) > > > > > at > org.apache.qpid.client.BasicMessageConsumer.notifyMessage > > > > > ( > > > > > BasicMessageConsumer.java:583) > > > > > at > > > > > org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage( > > > > > AMQSession.java:2579) > > > > > at org.apache.qpid.client.AMQSession$Dispatcher.run( > > > > > AMQSession.java > > > > > :2502) > > > > > - locked <0x88ff0240> (a java.lang.Object) > > > > > - locked <0x88ff0238> (a java.lang.Object) > > > > > "main": > > > > > at org.apache.qpid.client.AMQSession.close(AMQSession.java > > > > > :462) > > > > > - waiting to lock <0x88ff0240> (a java.lang.Object) > > > > > at org.apache.qpid.client.AMQConnection.closeAllSessions( > > > > > AMQConnection.java :752) > > > > > at org.apache.qpid.client.AMQConnection.close( > > > > > AMQConnection.java > > > > > :673) > > > > > - locked <0x8b85a1a0> (a java.lang.Object) > > > > > at org.apache.qpid.client.AMQConnection.close( > > > > > AMQConnection.java > > > > > :659) > > > > > at > org.apache.qpid.test.unit.client.forwardall.Service.close > > > > > ( > > > > > Service.java:71) > > > > > at > > > > > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeSC > ( > > > > > ServiceCreator.java:57) > > > > > at > > > > > > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeAll( > > > > > ServiceCreator.java:66) > > > > > at > > > > > org.apache.qpid.test.unit.client.forwardall.CombinedTest.tearDown > > > > > (CombinedTest.java:44) > > > > > at junit.framework.TestCase.runBare (TestCase.java:130) > > > > > at junit.framework.TestResult$1.protect(TestResult.java > :106) > > > > > at junit.framework.TestResult.runProtected(TestResult.java > > > > > :124) > > > > > at junit.framework.TestResult.run(TestResult.java :109) > > > > > at junit.framework.TestCase.run(TestCase.java:118) > > > > > at junit.framework.TestSuite.runTest(TestSuite.java:208) > > > > > at junit.framework.TestSuite.run(TestSuite.java:203) > > > > > at junit.extensions.TestDecorator.basicRun ( > > > > > TestDecorator.java:22) > > > > > > > > > > > > > > > > > > >
