Couldn't you just have dropped the acquire failover mutex that wraps the
whole process, and then acquire delivery lock followed by failover mutex
followed by session close, iterated over all sessions?

Rupert

On 17/09/2007, Robert Godfrey <[EMAIL PROTECTED]> wrote:
>
> On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]> wrote:
> >
> > Merging that to trunk will fix Rajith's observed deadlock. As I just
> > volunteered to do some merging, the fix will appear on trunk soon.
> >
> > Got to love that recursion till empty, followed by the more conventional
> > iteration of the 'closeAllSessions' method. God help anyone who tries to
> > read this code... but it works.
>
>
>
> Yeah - unfortunately there isn't an easy way to take out the locks by
> iterating... so recursion it had to be...  it's not the code I'm most
> proud
> of writing, that's for sure :-)
>
> -- Rob
>
>
> Rupert
> >
> > On 17/09/2007, Robert Godfrey <[EMAIL PROTECTED]> wrote:
> > >
> > > M2 should have a fix in that gets all the messageDeliveryLocks when
> you
> > do
> > > a
> > > Connection.close before it gets the mutex...
> > >
> > > Specifically, this nasty bit of recursion -
> > >
> > > public void close(List<AMQSession> sessions, long timeout) throws
> > > JMSException
> > >     {
> > >         synchronized(_sessionCreationLock)
> > >         {
> > >             if(!sessions.isEmpty())
> > >             {
> > >                 AMQSession session = sessions.remove(0);
> > >                 synchronized(session.getMessageDeliveryLock())
> > >                 {
> > >                     close(sessions, timeout);
> > >                 }
> > >             }
> > >             else
> > >
> > >
> > >
> > > ...
> > >
> > > -- Rob
> > >
> > > On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]> wrote:
> > > >
> > > > M2, Is it significantly different on trunk?
> > > >
> > > > On 17/09/2007, Robert Godfrey <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Rupert - which branch of the code are you looking at?
> > > > >
> > > > > -- Rob
> > > > >
> > > > > On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > Sorry, I'm talking rubbish. The ordering of the locks in
> > > > > AMQSession.closeis
> > > > > > fine, and as it should be. Message delivery, followed by
> failover
> > > > mutex.
> > > > > >
> > > > > > The problematic interaction is because the call to
> > > AMQConnection.close
> > > > ,
> > > > > > obtains the failover mutex first. I'd be tempted to try and
> > > eliminate
> > > > > that
> > > > > > one, so that as each session is closed delivery then failover is
> > > > > > obtained/released for each.
> > > > > >
> > > > > > You could remove the failover mutex from createTextMessage, but
> > > other
> > > > > > methods that do need to obtain it can be called from within a
> > > > > dispathcher
> > > > > > thread, so the problem can still occur.
> > > > > >
> > > > > > Rupert
> > > > > >
> > > > > > On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]>
> wrote:
> > > > > > >
> > > > > > > Also messageDeliveryLock seems like a poor choice of name.
> > > > > > > 'sessionCloseMutex' might be better.
> > > > > > >
> > > > > > > On 17/09/2007, Rupert Smith < [EMAIL PROTECTED]>
> > wrote:
> > > > > > > >
> > > > > > > > Actually, I see it now, the message delivery lock is
> obtained
> > in
> > > > the
> > > > > > > > dispatcher run method.
> > > > > > > >
> > > > > > > > So createTextMessage is doing:
> > > > > > > >
> > > > > > > > 1. Obtain message delivery lock.
> > > > > > > > 2. Obtain failover mutex.
> > > > > > > >
> > > > > > > > And the main thread calling close is doing:
> > > > > > > >
> > > > > > > > 1. Obtain failover mutex.
> > > > > > > > 2. Obtain message delivery lock.
> > > > > > > >
> > > > > > > > Just swap the lock ordering as I suggested. Unless this
> > > conflicts
> > > > > with
> > > > > > > > other orderings...
> > > > > > > >
> > > > > > > > On 17/09/2007, Rupert Smith <[EMAIL PROTECTED]>
> > > wrote:
> > > > > > > > >
> > > > > > > > > It looks to me that the message delivery lock is the more
> > > > > > fundamental
> > > > > > > > > cause of the dead lock. I can't figure out from the stack
> > > trace
> > > > > why
> > > > > > the
> > > > > > > > > dispatcher thread is holding that one (I suppose I should
> > look
> > > > at
> > > > > > what the
> > > > > > > > > test is doing to figure that out).
> > > > > > > > >
> > > > > > > > > Can we swap the order of obtaining the locks in
> > > AMQSession.close
> > > > > > (long
> > > > > > > > > timeout) from
> > > > > > > > >
> > > > > > > > > synchronized(_messageDeliveryLock)
> > > > > > > > > synchronized (_connection.getFailoverMutex())
> > > > > > > > >
> > > > > > > > > to
> > > > > > > > >
> > > > > > > > > synchronized (_connection.getFailoverMutex())
> > > > > > > > > synchronized(_messageDeliveryLock)
> > > > > > > > >
> > > > > > > > > ?
> > > > > > > > >
> > > > > > > > > Unless that causes a different dead lock, it looks like it
> > > will
> > > > > fix
> > > > > > > > > the problem.
> > > > > > > > >
> > > > > > > > > Also, it looks to me that removing the failover mutex
> > > > acquisition
> > > > > > from
> > > > > > > > > createTextMessage will be harmless enough.
> > > > > > > > >
> > > > > > > > > Rupert
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 15/09/2007, Rajith Attapattu < [EMAIL PROTECTED]>
> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > When I run the latest version of trunk I get a hang on
> > > > > > > > > > org.apache.qpid.test.unit.client.forwardall.CombinedTest
> > > > > > > > > > kill -3 (thread dump) provides information to say there
> is
> > a
> > > > > > > > > > deadlock.
> > > > > > > > > >
> > > > > > > > > > I also noted that all AMQSession.createXXXMessagecontains
> > > > > > > > > > synchronization
> > > > > > > > > > code.
> > > > > > > > > > (in this case the deadlock happens inn
> > > > > > AMQSession.createTextMessagetrying
> > > > > > > > > > to obtain the failover mutex)
> > > > > > > > > >
> > > > > > > > > > Do we really need this synchronization code when
> creating
> > > > > > messages?
> > > > > > > > > > I am not
> > > > > > > > > > really convinced it is needed.
> > > > > > > > > > Can someody explain the reason for that code block?
> > > > > > > > > >
> > > > > > > > > > Regard,
> > > > > > > > > >
> > > > > > > > > > Rajith
> > > > > > > > > >
> > > > > > > > > > I have attached the trace below.
> > > > > > > > > >
> > > > > > > > > > Found one Java-level deadlock:
> > > > > > > > > > =============================
> > > > > > > > > > "Dispatcher-Channel-1":
> > > > > > > > > >   waiting to lock monitor 0x09ff950c (object 0x8b85a1a0,
> a
> > > > > > > > > > java.lang.Object
> > > > > > > > > > ),
> > > > > > > > > >   which is held by "main"
> > > > > > > > > > "main":
> > > > > > > > > >   waiting to lock monitor 0x09ff964c (object 0x88ff0240,
> a
> > > > > > > > > > java.lang.Object
> > > > > > > > > > ),
> > > > > > > > > >   which is held by "Dispatcher-Channel-1"
> > > > > > > > > >
> > > > > > > > > > Java stack information for the threads listed above:
> > > > > > > > > > ===================================================
> > > > > > > > > > "Dispatcher-Channel-1":
> > > > > > > > > >         at
> > > org.apache.qpid.client.AMQSession.createTextMessage
> > > > (
> > > > > > > > > > AMQSession.java:1057)
> > > > > > > > > >         - waiting to lock <0x8b85a1a0> (a
> java.lang.Object
> > )
> > > > > > > > > >         at
> > > org.apache.qpid.client.AMQSession.createTextMessage(
> > > > > > > > > > AMQSession.java:1068)
> > > > > > > > > >         at
> > > > > > > > > >
> > > org.apache.qpid.test.unit.client.forwardall.Service.onMessage(
> > > > > > > > > > Service.java:59)
> > > > > > > > > >         at
> > > > > > org.apache.qpid.client.BasicMessageConsumer.notifyMessage
> > > > > > > > > > (
> > > > > > > > > > BasicMessageConsumer.java :628)
> > > > > > > > > >         at
> > > > > > org.apache.qpid.client.BasicMessageConsumer.notifyMessage
> > > > > > > > > > (
> > > > > > > > > > BasicMessageConsumer.java:583)
> > > > > > > > > >         at
> > > > > > > > > >
> > org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage
> > > (
> > > > > > > > > > AMQSession.java:2579)
> > > > > > > > > >         at
> > org.apache.qpid.client.AMQSession$Dispatcher.run(
> > > > > > > > > > AMQSession.java
> > > > > > > > > > :2502)
> > > > > > > > > >         - locked <0x88ff0240> (a java.lang.Object)
> > > > > > > > > >         - locked <0x88ff0238> (a java.lang.Object)
> > > > > > > > > > "main":
> > > > > > > > > >         at org.apache.qpid.client.AMQSession.close(
> > > > > AMQSession.java
> > > > > > > > > > :462)
> > > > > > > > > >         - waiting to lock <0x88ff0240> (a
> java.lang.Object
> > )
> > > > > > > > > >         at
> > > > org.apache.qpid.client.AMQConnection.closeAllSessions
> > > > > (
> > > > > > > > > > AMQConnection.java :752)
> > > > > > > > > >         at org.apache.qpid.client.AMQConnection.close(
> > > > > > > > > > AMQConnection.java
> > > > > > > > > > :673)
> > > > > > > > > >         - locked <0x8b85a1a0> (a java.lang.Object)
> > > > > > > > > >         at org.apache.qpid.client.AMQConnection.close(
> > > > > > > > > > AMQConnection.java
> > > > > > > > > > :659)
> > > > > > > > > >         at
> > > > > > org.apache.qpid.test.unit.client.forwardall.Service.close
> > > > > > > > > > (
> > > > > > > > > > Service.java:71)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeSC
> > > > > > (
> > > > > > > > > > ServiceCreator.java:57)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> > org.apache.qpid.test.unit.client.forwardall.ServiceCreator.closeAll(
> > > > > > > > > > ServiceCreator.java:66)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > org.apache.qpid.test.unit.client.forwardall.CombinedTest.tearDown
> > > > > > > > > > (CombinedTest.java:44)
> > > > > > > > > >         at junit.framework.TestCase.runBare (
> TestCase.java
> > > > :130)
> > > > > > > > > >         at junit.framework.TestResult$1.protect(
> > > > TestResult.java
> > > > > > :106)
> > > > > > > > > >         at junit.framework.TestResult.runProtected(
> > > > > TestResult.java
> > > > > > > > > > :124)
> > > > > > > > > >         at junit.framework.TestResult.run(
> TestResult.java
> > > :109)
> > > > > > > > > >         at junit.framework.TestCase.run(TestCase.java
> :118)
> > > > > > > > > >         at junit.framework.TestSuite.runTest(
> > TestSuite.java
> > > > :208)
> > > > > > > > > >         at junit.framework.TestSuite.run(TestSuite.java
> > :203)
> > > > > > > > > >         at junit.extensions.TestDecorator.basicRun (
> > > > > > > > > > TestDecorator.java:22)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to