[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang
[ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nabarun updated GEODE-5186: --- Fix Version/s: (was: 1.8.0) 1.7.0 > set operation in a client transaction could cause the transaction to hang > - > > Key: GEODE-5186 > URL: https://issues.apache.org/jira/browse/GEODE-5186 > Project: Geode > Issue Type: Bug > Components: transactions >Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0, > 1.7.0 >Reporter: Eric Shu >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > During an entry operation in a client transaction, server connection could be > lost. In this case, client will failover to another server and try to resume > the transaction and retry the operation if the original transaction host node > is found. > If this operation happens to be a keySet operation (or other set operations) > on a partitioned region, the transaction could hang due to a deadlock. > The scenario is the original tx host node holds its transactional lock when > sending fetchKey request to other nodes hosting the partitioned region data. > The node on which the client transaction failed over, will hold its > transactional lock while sending the FetchKey message to transaction hosting > node. > These two FetchKeyMessage will not be able to be processed as processing > these tx message requires to hold the lock. But the locks are already been > held by the nodes handing the client message of the transaction. > {noformat} > vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) > state=WAITING > waiting to lock > > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921) > at > org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881) > at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4 > vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 > ID=0x128(296) state=TIMED_WAITING > waiting to lock > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766) >
[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang
[ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Shu updated GEODE-5186: Affects Version/s: 1.7.0 > set operation in a client transaction could cause the transaction to hang > - > > Key: GEODE-5186 > URL: https://issues.apache.org/jira/browse/GEODE-5186 > Project: Geode > Issue Type: Bug > Components: transactions >Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0, > 1.7.0 >Reporter: Eric Shu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > During an entry operation in a client transaction, server connection could be > lost. In this case, client will failover to another server and try to resume > the transaction and retry the operation if the original transaction host node > is found. > If this operation happens to be a keySet operation (or other set operations) > on a partitioned region, the transaction could hang due to a deadlock. > The scenario is the original tx host node holds its transactional lock when > sending fetchKey request to other nodes hosting the partitioned region data. > The node on which the client transaction failed over, will hold its > transactional lock while sending the FetchKey message to transaction hosting > node. > These two FetchKeyMessage will not be able to be processed as processing > these tx message requires to hold the lock. But the locks are already been > held by the nodes handing the client message of the transaction. > {noformat} > vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) > state=WAITING > waiting to lock >> at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921) > at > org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881) > at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4 > vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 > ID=0x128(296) state=TIMED_WAITING > waiting to lock > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790) > at >
[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang
[ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-5186: -- Labels: pull-request-available (was: ) > set operation in a client transaction could cause the transaction to hang > - > > Key: GEODE-5186 > URL: https://issues.apache.org/jira/browse/GEODE-5186 > Project: Geode > Issue Type: Bug > Components: transactions >Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0 >Reporter: Eric Shu >Priority: Major > Labels: pull-request-available > > During an entry operation in a client transaction, server connection could be > lost. In this case, client will failover to another server and try to resume > the transaction and retry the operation if the original transaction host node > is found. > If this operation happens to be a keySet operation (or other set operations) > on a partitioned region, the transaction could hang due to a deadlock. > The scenario is the original tx host node holds its transactional lock when > sending fetchKey request to other nodes hosting the partitioned region data. > The node on which the client transaction failed over, will hold its > transactional lock while sending the FetchKey message to transaction hosting > node. > These two FetchKeyMessage will not be able to be processed as processing > these tx message requires to hold the lock. But the locks are already been > held by the nodes handing the client message of the transaction. > {noformat} > vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) > state=WAITING > waiting to lock >> at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921) > at > org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881) > at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4 > vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 > ID=0x128(296) state=TIMED_WAITING > waiting to lock > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790) > at >
[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang
[ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Shu updated GEODE-5186: Affects Version/s: 1.1.0 1.1.1 1.2.0 1.3.0 1.2.1 1.4.0 1.5.0 1.6.0 > set operation in a client transaction could cause the transaction to hang > - > > Key: GEODE-5186 > URL: https://issues.apache.org/jira/browse/GEODE-5186 > Project: Geode > Issue Type: Bug > Components: transactions >Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0 >Reporter: Eric Shu >Priority: Major > > During an entry operation in a client transaction, server connection could be > lost. In this case, client will failover to another server and try to resume > the transaction and retry the operation if the original transaction host node > is found. > If this operation happens to be a keySet operation (or other set operations) > on a partitioned region, the transaction could hang due to a deadlock. > The scenario is the original tx host node holds its transactional lock when > sending fetchKey request to other nodes hosting the partitioned region data. > The node on which the client transaction failed over, will hold its > transactional lock while sending the FetchKey message to transaction hosting > node. > These two FetchKeyMessage will not be able to be processed as processing > these tx message requires to hold the lock. But the locks are already been > held by the nodes handing the client message of the transaction. > {noformat} > vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) > state=WAITING > waiting to lock >> at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921) > at > org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881) > at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4 > vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 > ID=0x128(296) state=TIMED_WAITING > waiting to lock > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at >