[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-11-18 Thread Mark Hanson (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hanson updated GEODE-7082:
---
Labels: flaky  (was: )

> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung or took too long 
> 
>
> Key: GEODE-7082
> URL: https://issues.apache.org/jira/browse/GEODE-7082
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Mark Hanson
>Assignee: Kirk Lund
>Priority: Major
>  Labels: flaky
> Attachments: callstacks-2019-08-12-19-22-55.txt, 
> callstacks-2019-08-12-19-23-06.txt, callstacks-2019-08-12-19-23-16.txt, 
> dunit-hangs.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]
> The run of the distributed test for java 8 failed because a timeout was 
> exceeded.
> I've attached the dunit-hangs.txt and callstack files from dunit #981. It 
> looks like 
> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung. Several threads are trying to create buckets while another is trying 
> to shutdown the cluster. Thread stacks are below.
> There are several threads creating buckets and stuck waiting for replies to 
> PrepareNewPersistentMemberMessage:
> {noformat}
> "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
> tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xe1fb4410> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
> at 
> org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
> at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
> at 
> org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
> at 
> org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
> at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
> at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
> at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
> at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
> at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
> at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
> - locked <0xe1fb4b28> (a 
> org.apache.geode.internal.cache.ProxyBucketRegion)
> at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
> at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
> at 
> org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
> at 
> org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
> at 
> 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

I've attached the dunit-hangs.txt and callstack files from dunit #981. It looks 
like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. One thread is trying to create buckets while another is trying to 
shutdown the cluster. Thread stacks are below.

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:

{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

I've attached the dunit-hangs.txt and callstack files from dunit #981. It looks 
like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. Several threads are trying to create buckets while another is trying to 
shutdown the cluster. Thread stacks are below.

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:

{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

I've attached the dunit-hangs.txt and callstack files from dunit #981.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:

{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Attachment: dunit-hangs.txt
callstacks-2019-08-12-19-23-16.txt
callstacks-2019-08-12-19-23-06.txt
callstacks-2019-08-12-19-22-55.txt

> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung or took too long 
> 
>
> Key: GEODE-7082
> URL: https://issues.apache.org/jira/browse/GEODE-7082
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Mark Hanson
>Assignee: Jason Huynh
>Priority: Major
> Attachments: callstacks-2019-08-12-19-22-55.txt, 
> callstacks-2019-08-12-19-23-06.txt, callstacks-2019-08-12-19-23-16.txt, 
> dunit-hangs.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]
> The run of the distributed test for java 8 failed because a timeout was 
> exceeded.
> It looks like 
> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung. 
> One thread is trying to create buckets while another is trying to shutdown 
> the cluster. Thread stacks are below.
> The thread trying to shutdown the cluster is using an old deprecated internal 
> method rather than going through a current API:
> {noformat}
>   vm0.invoke(new SerializableCallable() {
> @Override
> public Object call() throws Exception {
>   InternalDistributedSystem ds =
>   (InternalDistributedSystem) getCache().getDistributedSystem();
>   
> AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
> 60);
>   return null;
> }
>   });
> {noformat}
> There are several threads creating buckets and stuck waiting for replies to 
> PrepareNewPersistentMemberMessage:
> {noformat}
> "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
> tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xe1fb4410> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
> at 
> org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
> at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
> at 
> org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
> at 
> org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
> at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
> at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
> at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
> at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
> at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
> at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
> - locked <0xe1fb4b28> (a 
> org.apache.geode.internal.cache.ProxyBucketRegion)
> at 
> 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

I've attached the dunit-hangs.txt and callstack files from dunit #981.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
  vm0.invoke(new SerializableCallable() {

@Override
public Object call() throws Exception {
  InternalDistributedSystem ds =
  (InternalDistributedSystem) getCache().getDistributedSystem();
  
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
60);
  return null;
}
  });
{noformat}

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
  vm0.invoke(new SerializableCallable() {

@Override
public Object call() throws Exception {
  InternalDistributedSystem ds =
  (InternalDistributedSystem) getCache().getDistributedSystem();
  
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
60);
  return null;
}
  });
{noformat}

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
  vm0.invoke(new SerializableCallable() {

@Override
public Object call() throws Exception {
  InternalDistributedSystem ds =
  (InternalDistributedSystem) getCache().getDistributedSystem();
  
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
60);
  return null;
}
  });
{noformat}

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
  vm0.invoke(new SerializableCallable() {

@Override
public Object call() throws Exception {
  InternalDistributedSystem ds =
  (InternalDistributedSystem) getCache().getDistributedSystem();
  
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
60);
  return null;
}
  });
{noformat}

Here are many threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
  vm0.invoke(new SerializableCallable() {

@Override
public Object call() throws Exception {
  InternalDistributedSystem ds =
  (InternalDistributedSystem) getCache().getDistributedSystem();
  
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
60);
  return null;
}
  });
{noformat}

Here are many threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x7f29a0005800 nid=0x1a84 waiting on condition [0x7f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
- locked <0xe1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 

[jira] [Updated] (GEODE-7082) PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung or took too long

2019-08-13 Thread Kirk Lund (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-
Summary: 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung or took too long   (was: PersistentColocatedPartitionedRegionDUnitTest. 
PersistentColocatedPartitionedRegionDUnitTest )

> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung or took too long 
> 
>
> Key: GEODE-7082
> URL: https://issues.apache.org/jira/browse/GEODE-7082
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Mark Hanson
>Assignee: Mark Hanson
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]
> The run of the distributed test for java 8 failed because a timeout was 
> exceeded. It appears they succeeded shortly after the timeout.
> Need to extend the timeout by a few minutes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)