[ 
https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Shu resolved GEODE-5186.
-----------------------------
       Resolution: Fixed
    Fix Version/s: 1.8.0

> set operation in a client transaction could cause the transaction to hang
> -------------------------------------------------------------------------
>
>                 Key: GEODE-5186
>                 URL: https://issues.apache.org/jira/browse/GEODE-5186
>             Project: Geode
>          Issue Type: Bug
>          Components: transactions
>    Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0
>            Reporter: Eric Shu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During an entry operation in a client transaction, server connection could be 
> lost. In this case, client will failover to another server and try to resume 
> the transaction and retry the operation if the original transaction host node 
> is found. 
> If this operation happens to be a keySet operation (or other set operations) 
> on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when 
> sending fetchKey request to other nodes hosting the partitioned region data. 
> The node on which the client transaction failed over, will hold its 
> transactional lock while sending the FetchKey message to transaction hosting 
> node.
> These two FetchKeyMessage will not be able to be processed as processing 
> these tx message requires to hold the lock. But the locks are already been 
> held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) 
> state=WAITING
>         waiting to lock 
> <java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at 
> org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
>         at 
> org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
>         at 
> org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
>         at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
>         at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 
> ID=0x128(296) state=TIMED_WAITING
>         waiting to lock <java.util.concurrent.CountDownLatch$Sync@226dbb4>
>         at sun.misc.Unsafe.park(Native Method)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
>         at 
> org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
>         at 
> org.apache.geode.internal.cache.TXStateStub.getBucketKeys(TXStateStub.java:644)
>         at 
> org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
>         at 
> java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
>         at 
> org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
>         at 
> org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
>         at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
>         at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
>         at 
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
>         at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@3ca60534
> java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb
> vm_0_bridge1_latvia_25064:PartitionedRegion Message Processor4 ID=0x2b8(696) 
> state=WAITING
>         waiting to lock 
> <java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785>
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at 
> org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
>         at 
> org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
>         at 
> org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
>         at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
>         at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@71b1b4c5
> vm_0_bridge1_latvia_25064:ServerConnection on port 24946 Thread 0 
> ID=0x29b(667) state=TIMED_WAITING
>         waiting to lock <java.util.concurrent.CountDownLatch$Sync@41e6d28f>
>         at sun.misc.Unsafe.park(Native Method)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
>         at 
> org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
>         at 
> org.apache.geode.internal.cache.TXState.getBucketKeys(TXState.java:1852)
>         at 
> org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
>         at 
> java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
>         at 
> org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
>         at 
> org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
>         at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
>         at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
>         at 
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
>         at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
>         at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785
> java.util.concurrent.ThreadPoolExecutor$Worker@51e84752
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to