[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang

2018-08-27 Thread nabarun (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nabarun updated GEODE-5186:
---
Fix Version/s: (was: 1.8.0)
   1.7.0

> set operation in a client transaction could cause the transaction to hang
> -
>
> Key: GEODE-5186
> URL: https://issues.apache.org/jira/browse/GEODE-5186
> Project: Geode
>  Issue Type: Bug
>  Components: transactions
>Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0
>Reporter: Eric Shu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During an entry operation in a client transaction, server connection could be 
> lost. In this case, client will failover to another server and try to resume 
> the transaction and retry the operation if the original transaction host node 
> is found. 
> If this operation happens to be a keySet operation (or other set operations) 
> on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when 
> sending fetchKey request to other nodes hosting the partitioned region data. 
> The node on which the client transaction failed over, will hold its 
> transactional lock while sending the FetchKey message to transaction hosting 
> node.
> These two FetchKeyMessage will not be able to be processed as processing 
> these tx message requires to hold the lock. But the locks are already been 
> held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) 
> state=WAITING
> waiting to lock 
> 
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
> at 
> org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
> at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
> at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 
> ID=0x128(296) state=TIMED_WAITING
> waiting to lock 
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
> 

[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang

2018-05-23 Thread Eric Shu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Shu updated GEODE-5186:

Affects Version/s: 1.7.0

> set operation in a client transaction could cause the transaction to hang
> -
>
> Key: GEODE-5186
> URL: https://issues.apache.org/jira/browse/GEODE-5186
> Project: Geode
>  Issue Type: Bug
>  Components: transactions
>Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0
>Reporter: Eric Shu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During an entry operation in a client transaction, server connection could be 
> lost. In this case, client will failover to another server and try to resume 
> the transaction and retry the operation if the original transaction host node 
> is found. 
> If this operation happens to be a keySet operation (or other set operations) 
> on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when 
> sending fetchKey request to other nodes hosting the partitioned region data. 
> The node on which the client transaction failed over, will hold its 
> transactional lock while sending the FetchKey message to transaction hosting 
> node.
> These two FetchKeyMessage will not be able to be processed as processing 
> these tx message requires to hold the lock. But the locks are already been 
> held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) 
> state=WAITING
> waiting to lock 
> 
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
> at 
> org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
> at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
> at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 
> ID=0x128(296) state=TIMED_WAITING
> waiting to lock 
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
> at 
> 

[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang

2018-05-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-5186:
--
Labels: pull-request-available  (was: )

> set operation in a client transaction could cause the transaction to hang
> -
>
> Key: GEODE-5186
> URL: https://issues.apache.org/jira/browse/GEODE-5186
> Project: Geode
>  Issue Type: Bug
>  Components: transactions
>Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0
>Reporter: Eric Shu
>Priority: Major
>  Labels: pull-request-available
>
> During an entry operation in a client transaction, server connection could be 
> lost. In this case, client will failover to another server and try to resume 
> the transaction and retry the operation if the original transaction host node 
> is found. 
> If this operation happens to be a keySet operation (or other set operations) 
> on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when 
> sending fetchKey request to other nodes hosting the partitioned region data. 
> The node on which the client transaction failed over, will hold its 
> transactional lock while sending the FetchKey message to transaction hosting 
> node.
> These two FetchKeyMessage will not be able to be processed as processing 
> these tx message requires to hold the lock. But the locks are already been 
> held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) 
> state=WAITING
> waiting to lock 
> 
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
> at 
> org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
> at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
> at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 
> ID=0x128(296) state=TIMED_WAITING
> waiting to lock 
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
> at 
> 

[jira] [Updated] (GEODE-5186) set operation in a client transaction could cause the transaction to hang

2018-05-07 Thread Eric Shu (JIRA)

 [ 
https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Shu updated GEODE-5186:

Affects Version/s: 1.1.0
   1.1.1
   1.2.0
   1.3.0
   1.2.1
   1.4.0
   1.5.0
   1.6.0

> set operation in a client transaction could cause the transaction to hang
> -
>
> Key: GEODE-5186
> URL: https://issues.apache.org/jira/browse/GEODE-5186
> Project: Geode
>  Issue Type: Bug
>  Components: transactions
>Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0
>Reporter: Eric Shu
>Priority: Major
>
> During an entry operation in a client transaction, server connection could be 
> lost. In this case, client will failover to another server and try to resume 
> the transaction and retry the operation if the original transaction host node 
> is found. 
> If this operation happens to be a keySet operation (or other set operations) 
> on a partitioned region, the transaction could hang due to a deadlock.
> The scenario is the original tx host node holds its transactional lock when 
> sending fetchKey request to other nodes hosting the partitioned region data. 
> The node on which the client transaction failed over, will hold its 
> transactional lock while sending the FetchKey message to transaction hosting 
> node.
> These two FetchKeyMessage will not be able to be processed as processing 
> these tx message requires to hold the lock. But the locks are already been 
> held by the nodes handing the client message of the transaction.
> {noformat}
> vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) 
> state=WAITING
> waiting to lock 
> 
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
> at 
> org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
> at 
> org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
> at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
> at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
> at java.lang.Thread.run(Thread.java:745)
> Locked synchronizers:
> java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4
> vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 
> ID=0x128(296) state=TIMED_WAITING
> waiting to lock 
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
> at 
>