[jira] [Created] (GEODE-9847) Benchmark instability in PartitionedPutLongBenchmark with security manager on support/1.14

2021-11-22 Thread Kamilla Aslami (Jira)
Kamilla Aslami created GEODE-9847:
-

 Summary: Benchmark instability in PartitionedPutLongBenchmark with 
security manager on support/1.14
 Key: GEODE-9847
 URL: https://issues.apache.org/jira/browse/GEODE-9847
 Project: Geode
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1.14.1
Reporter: Kamilla Aslami


PartitionedPutLongBenchmark failed in 
apache-support-1-14-main/benchmark-with-security-manager. This issue could have 
the same root cause as GEODE-9340, but GEODE-9340 fails on 1.15 and in another 
CI job (apache-develop-main/benchmark-base).
{noformat}
org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark
05:20:08  average ops/second  Baseline:381785.31  Test:
351135.20  Difference:   -8.0%
05:20:08   ops/second standard error  Baseline:  2163.90  Test:  
3115.81  Difference:  +44.0%
05:20:08   ops/second standard deviation  Baseline: 37417.40  Test: 
53877.38  Difference:  +44.0%
05:20:08  YS 99th percentile latency  Baseline:  1606.00  Test:  
1606.00  Difference:   +0.0%
05:20:08  median latency  Baseline:   1068031.00  Test:   
1065983.00  Difference:   -0.2%
05:20:08 90th percentile latency  Baseline:   1364991.00  Test:   
1356799.00  Difference:   -0.6%
05:20:08 99th percentile latency  Baseline:   7688191.00  Test:   
8138751.00  Difference:   +5.9%
05:20:08   99.9th percentile latency  Baseline: 209584127.00  Test: 
260964351.00  Difference:  +24.5%
05:20:08 average latency  Baseline:   1884576.09  Test:   
2050262.70  Difference:   +8.8%
05:20:08  latency standard deviation  Baseline:  11587055.57  Test:  
14728140.58  Difference:  +27.1%
05:20:08  latency standard error  Baseline:  1083.17  Test:  
1435.92  Difference:  +32.6%
05:20:08  average ops/second  Baseline:381621.08  Test:
350789.84  Difference:   -8.1%
05:20:08BENCHMARK FAILED: 
org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark average latency is 
5% worse than baseline.
05:20:08{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9847) Benchmark instability in PartitionedPutLongBenchmark with security manager on support/1.14

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447731#comment-17447731
 ] 

Geode Integration commented on GEODE-9847:
--

Seen on support/1.14 in [benchmark-with-security-manager 
#3|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-main/jobs/benchmark-with-security-manager/builds/3].

> Benchmark instability in PartitionedPutLongBenchmark with security manager on 
> support/1.14
> --
>
> Key: GEODE-9847
> URL: https://issues.apache.org/jira/browse/GEODE-9847
> Project: Geode
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 1.14.1
>Reporter: Kamilla Aslami
>Priority: Major
>
> PartitionedPutLongBenchmark failed in 
> apache-support-1-14-main/benchmark-with-security-manager. This issue could 
> have the same root cause as GEODE-9340, but GEODE-9340 fails on 1.15 and in 
> another CI job (apache-develop-main/benchmark-base).
> {noformat}
> org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark
> 05:20:08  average ops/second  Baseline:381785.31  Test:
> 351135.20  Difference:   -8.0%
> 05:20:08   ops/second standard error  Baseline:  2163.90  Test:  
> 3115.81  Difference:  +44.0%
> 05:20:08   ops/second standard deviation  Baseline: 37417.40  Test: 
> 53877.38  Difference:  +44.0%
> 05:20:08  YS 99th percentile latency  Baseline:  1606.00  Test:  
> 1606.00  Difference:   +0.0%
> 05:20:08  median latency  Baseline:   1068031.00  Test:   
> 1065983.00  Difference:   -0.2%
> 05:20:08 90th percentile latency  Baseline:   1364991.00  Test:   
> 1356799.00  Difference:   -0.6%
> 05:20:08 99th percentile latency  Baseline:   7688191.00  Test:   
> 8138751.00  Difference:   +5.9%
> 05:20:08   99.9th percentile latency  Baseline: 209584127.00  Test: 
> 260964351.00  Difference:  +24.5%
> 05:20:08 average latency  Baseline:   1884576.09  Test:   
> 2050262.70  Difference:   +8.8%
> 05:20:08  latency standard deviation  Baseline:  11587055.57  Test:  
> 14728140.58  Difference:  +27.1%
> 05:20:08  latency standard error  Baseline:  1083.17  Test:  
> 1435.92  Difference:  +32.6%
> 05:20:08  average ops/second  Baseline:381621.08  Test:
> 350789.84  Difference:   -8.1%
> 05:20:08BENCHMARK FAILED: 
> org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark average latency 
> is 5% worse than baseline.
> 05:20:08{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9846) CI failure: Many tests in ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with ConnectException

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447724#comment-17447724
 ] 

Geode Integration commented on GEODE-9846:
--

Seen in [upgrade-test-openjdk8 
#20|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/upgrade-test-openjdk8/builds/20]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/upgradeTest/1637612592/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-artifacts/1637612592/upgradetestfiles-openjdk8-1.15.0-build.0685.tgz].

> CI failure: Many tests in 
> ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with 
> ConnectException
> --
>
> Key: GEODE-9846
> URL: https://issues.apache.org/jira/browse/GEODE-9846
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, security
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>
> There were 96 failures in 
> ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest, all of them 
> failed with `java.net.ConnectException: Connection refused`.
> This could be a transient network issue but it seems suspicious that all 
> failures occured in the same DUnit test file. I'm only adding one stacktrace 
> to the ticket description, others can be found 
> [here|[http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/upgradeTest/1637612592/]].
>  
> {noformat}
> ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest > 
> dataReaderCanRegisterAndUnregisterAcrossFailover[clientVersion=1.2.0] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties$$Lambda$302/1055293868.run
>  in VM 3 running on Host 
> heavy-lifter-617ee9be-7fe1-5cc4-bf4f-307cc3e33a7b.c.apachegeode-ci.internal 
> with 4 VMs with version 1.2.0
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:635)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:448)
> at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:59)
> at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:48)
> at 
> org.apache.geode.test.dunit.rules.RemoteInvoker.invokeInEveryVMAndController(RemoteInvoker.java:49)
> at 
> org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties.after(DistributedRestoreSystemProperties.java:44)
> at 
> org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81)
> at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule.after(ClusterStartupRule.java:176)
> at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule.access$100(ClusterStartupRule.java:69)
> at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:140)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> 

[jira] [Created] (GEODE-9846) CI failure: Many tests in ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with ConnectException

2021-11-22 Thread Kamilla Aslami (Jira)
Kamilla Aslami created GEODE-9846:
-

 Summary: CI failure: Many tests in 
ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with 
ConnectException
 Key: GEODE-9846
 URL: https://issues.apache.org/jira/browse/GEODE-9846
 Project: Geode
  Issue Type: Bug
  Components: client/server, security
Affects Versions: 1.15.0
Reporter: Kamilla Aslami


There were 96 failures in 
ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest, all of them 
failed with `java.net.ConnectException: Connection refused`.

This could be a transient network issue but it seems suspicious that all 
failures occured in the same DUnit test file. I'm only adding one stacktrace to 
the ticket description, others can be found 
[here|[http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/upgradeTest/1637612592/]].

 
{noformat}
ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest > 
dataReaderCanRegisterAndUnregisterAcrossFailover[clientVersion=1.2.0] FAILED
org.apache.geode.test.dunit.RMIException: While invoking 
org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties$$Lambda$302/1055293868.run
 in VM 3 running on Host 
heavy-lifter-617ee9be-7fe1-5cc4-bf4f-307cc3e33a7b.c.apachegeode-ci.internal 
with 4 VMs with version 1.2.0
at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:635)
at org.apache.geode.test.dunit.VM.invoke(VM.java:448)
at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:59)
at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:48)
at 
org.apache.geode.test.dunit.rules.RemoteInvoker.invokeInEveryVMAndController(RemoteInvoker.java:49)
at 
org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties.after(DistributedRestoreSystemProperties.java:44)
at 
org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81)
at 
org.apache.geode.test.dunit.rules.ClusterStartupRule.after(ClusterStartupRule.java:176)
at 
org.apache.geode.test.dunit.rules.ClusterStartupRule.access$100(ClusterStartupRule.java:69)
at 
org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:140)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at 

[jira] [Updated] (GEODE-9845) CI failure: Multiple tests in OutOfMemoryDUnitTest failed with ConnectException

2021-11-22 Thread Kamilla Aslami (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamilla Aslami updated GEODE-9845:
--
Summary: CI failure: Multiple tests in OutOfMemoryDUnitTest failed with 
ConnectException  (was: Multiple tests in OutOfMemoryDUnitTest failed with 
ConnectException)

> CI failure: Multiple tests in OutOfMemoryDUnitTest failed with 
> ConnectException
> ---
>
> Key: GEODE-9845
> URL: https://issues.apache.org/jira/browse/GEODE-9845
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>
> 4 tests in OutOfMemoryDUnitTest failed with `java.net.ConnectException: 
> Connection refused`.
> {noformat}
> OutOfMemoryDUnitTest > shouldAllowDeleteOperations_afterThresholdReached 
> FAILED
> java.lang.AssertionError: 
> Expecting throwable message:
>   "No more cluster attempts left."
> to contain:
>   "OOM command not allowed"
> but did not.
> Throwable that failed the check:
> redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more 
> cluster attempts left.
>   at 
> redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:156)
>   at 
> redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:45)
>   at redis.clients.jedis.JedisCluster.set(JedisCluster.java:293)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.setRedisKeyAndValue(OutOfMemoryDUnitTest.java:228)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.lambda$addMultipleKeys$5(OutOfMemoryDUnitTest.java:212)
>   at 
> org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:62)
>   at 
> org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:877)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.addMultipleKeys(OutOfMemoryDUnitTest.java:210)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.fillMemory(OutOfMemoryDUnitTest.java:201)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.shouldAllowDeleteOperations_afterThresholdReached(OutOfMemoryDUnitTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>   at 
> 

[jira] [Commented] (GEODE-9845) Multiple tests in OutOfMemoryDUnitTest failed with ConnectException

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447722#comment-17447722
 ] 

Geode Integration commented on GEODE-9845:
--

Seen in [distributed-test-openjdk8 
#174|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/174]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637433540/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637433540/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> Multiple tests in OutOfMemoryDUnitTest failed with ConnectException
> ---
>
> Key: GEODE-9845
> URL: https://issues.apache.org/jira/browse/GEODE-9845
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>
> 4 tests in OutOfMemoryDUnitTest failed with `java.net.ConnectException: 
> Connection refused`.
> {noformat}
> OutOfMemoryDUnitTest > shouldAllowDeleteOperations_afterThresholdReached 
> FAILED
> java.lang.AssertionError: 
> Expecting throwable message:
>   "No more cluster attempts left."
> to contain:
>   "OOM command not allowed"
> but did not.
> Throwable that failed the check:
> redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more 
> cluster attempts left.
>   at 
> redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:156)
>   at 
> redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:45)
>   at redis.clients.jedis.JedisCluster.set(JedisCluster.java:293)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.setRedisKeyAndValue(OutOfMemoryDUnitTest.java:228)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.lambda$addMultipleKeys$5(OutOfMemoryDUnitTest.java:212)
>   at 
> org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:62)
>   at 
> org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:877)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.addMultipleKeys(OutOfMemoryDUnitTest.java:210)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.fillMemory(OutOfMemoryDUnitTest.java:201)
>   at 
> org.apache.geode.redis.OutOfMemoryDUnitTest.shouldAllowDeleteOperations_afterThresholdReached(OutOfMemoryDUnitTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   

[jira] [Created] (GEODE-9845) Multiple tests in OutOfMemoryDUnitTest failed with ConnectException

2021-11-22 Thread Kamilla Aslami (Jira)
Kamilla Aslami created GEODE-9845:
-

 Summary: Multiple tests in OutOfMemoryDUnitTest failed with 
ConnectException
 Key: GEODE-9845
 URL: https://issues.apache.org/jira/browse/GEODE-9845
 Project: Geode
  Issue Type: Bug
  Components: redis
Affects Versions: 1.15.0
Reporter: Kamilla Aslami


4 tests in OutOfMemoryDUnitTest failed with `java.net.ConnectException: 
Connection refused`.
{noformat}
OutOfMemoryDUnitTest > shouldAllowDeleteOperations_afterThresholdReached FAILED
java.lang.AssertionError: 
Expecting throwable message:
  "No more cluster attempts left."
to contain:
  "OOM command not allowed"
but did not.

Throwable that failed the check:

redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more 
cluster attempts left.
  at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:156)
  at 
redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:45)
  at redis.clients.jedis.JedisCluster.set(JedisCluster.java:293)
  at 
org.apache.geode.redis.OutOfMemoryDUnitTest.setRedisKeyAndValue(OutOfMemoryDUnitTest.java:228)
  at 
org.apache.geode.redis.OutOfMemoryDUnitTest.lambda$addMultipleKeys$5(OutOfMemoryDUnitTest.java:212)
  at 
org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:62)
  at 
org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:877)
  at 
org.apache.geode.redis.OutOfMemoryDUnitTest.addMultipleKeys(OutOfMemoryDUnitTest.java:210)
  at 
org.apache.geode.redis.OutOfMemoryDUnitTest.fillMemory(OutOfMemoryDUnitTest.java:201)
  at 
org.apache.geode.redis.OutOfMemoryDUnitTest.shouldAllowDeleteOperations_afterThresholdReached(OutOfMemoryDUnitTest.java:166)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
  at 
org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
  at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
  at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
  at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
  at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
  at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
  at 
org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
  at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
  at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
  at java.util.Iterator.forEachRemaining(Iterator.java:116)
  at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
  at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
  at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
  at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
  at 

[jira] [Commented] (GEODE-9844) CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with AssertionError

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447704#comment-17447704
 ] 

Geode Integration commented on GEODE-9844:
--

Seen in [distributed-test-openjdk8 
#114|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/114]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637385637/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637385637/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with 
> AssertionError
> 
>
> Key: GEODE-9844
> URL: https://issues.apache.org/jira/browse/GEODE-9844
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>
> {noformat}
> RebalanceCommandDUnitTest > testWithTimeOut FAILED
> java.lang.AssertionError: 
> Expecting actual:
>   7
> to be less than or equal to:
>   1 
> at 
> org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.assertRegionBalanced(RebalanceCommandDUnitTest.java:288)
> at 
> org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.testWithTimeOut(RebalanceCommandDUnitTest.java:133)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9844) CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with AssertionError

2021-11-22 Thread Kamilla Aslami (Jira)
Kamilla Aslami created GEODE-9844:
-

 Summary: CI failure: RebalanceCommandDUnitTest.testWithTimeOut 
failed with AssertionError
 Key: GEODE-9844
 URL: https://issues.apache.org/jira/browse/GEODE-9844
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Affects Versions: 1.15.0
Reporter: Kamilla Aslami


{noformat}
RebalanceCommandDUnitTest > testWithTimeOut FAILED
java.lang.AssertionError: 
Expecting actual:
  7
to be less than or equal to:
  1 
at 
org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.assertRegionBalanced(RebalanceCommandDUnitTest.java:288)
at 
org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.testWithTimeOut(RebalanceCommandDUnitTest.java:133)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9782) improve package organization of geode-for-redis

2021-11-22 Thread Donal Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans reassigned GEODE-9782:
--

Assignee: Donal Evans

> improve package organization of geode-for-redis
> ---
>
> Key: GEODE-9782
> URL: https://issues.apache.org/jira/browse/GEODE-9782
> Project: Geode
>  Issue Type: Improvement
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Darrel Schneider
>Assignee: Donal Evans
>Priority: Major
>
> It would be nice to improve how the internals of geode-for-redis are packaged 
> before it is released in 1.15. Try to do this when others are not active 
> working on these classes since it could cause a bunch of conflicts. Be aware 
> that a few of the internals may have dependencies outside of geode and those 
> will also need to be updated. Make sure and move corresponding tests to be in 
> the same package. Here are some ideas:
>  # move the collections package into the data package
>  # move the delta package into the data package
>  # move all the Stripe classes in the services package into a new 
> services.locking package
>  # move RegionProvider into services
>  # move PassiveExpirationManager into services
>  # move RedisSanctionedSerializablesService into the services
>  # move SlotAdvisor into the cluster package
>  # move the cluster package into the services package (or leave it as is, 
> also consider moving pubsub and statics into services. The "services" package 
> is so generic lots of things could be put into it or we could get rid of it).
>  # create a new package named "commands"
>  # move Command, RedisCommandSupportLevel, and RedisCommandType into commands
>  # move parameters into commands
>  # move executor into commands



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9843) CI failure: DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts failed with TooFewActualInvocations

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447699#comment-17447699
 ] 

Geode Integration commented on GEODE-9843:
--

Seen in [distributed-test-openjdk8 
#139|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/139]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637402577/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637402577/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> CI failure: 
> DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts
>  failed with TooFewActualInvocations
> -
>
> Key: GEODE-9843
> URL: https://issues.apache.org/jira/browse/GEODE-9843
> Project: Geode
>  Issue Type: Bug
>  Components: core, management
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>
> {noformat}
> DistributedSystemMXBeanWithAlertsDistributedTest > 
> managerMissesAnyAlertsBeforeItStarts FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest$$Lambda$301/390135366.run
>  in VM 0 running on Host 
> heavy-lifter-993df0f4-3655-560f-82a7-0c09d04efdd9.c.apachegeode-ci.internal 
> with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:448)
> at 
> org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts(DistributedSystemMXBeanWithAlertsDistributedTest.java:379)
> Caused by:
> org.mockito.exceptions.verification.TooFewActualInvocations: 
> notificationListener.handleNotification(
> ,
> isNull()
> );
> Wanted 3 times:
> -> at 
> org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439)
> But was 2 times:
> -> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754)
> -> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754)
> at 
> org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439)
> at 
> org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllAlerts(DistributedSystemMXBeanWithAlertsDistributedTest.java:451)
> at 
> org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.lambda$managerMissesAnyAlertsBeforeItStarts$bb17a952$6(DistributedSystemMXBeanWithAlertsDistributedTest.java:380)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9843) CI failure: DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts failed with TooFewActualInvocations

2021-11-22 Thread Kamilla Aslami (Jira)
Kamilla Aslami created GEODE-9843:
-

 Summary: CI failure: 
DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts
 failed with TooFewActualInvocations
 Key: GEODE-9843
 URL: https://issues.apache.org/jira/browse/GEODE-9843
 Project: Geode
  Issue Type: Bug
  Components: core, management
Affects Versions: 1.15.0
Reporter: Kamilla Aslami


{noformat}
DistributedSystemMXBeanWithAlertsDistributedTest > 
managerMissesAnyAlertsBeforeItStarts FAILED
org.apache.geode.test.dunit.RMIException: While invoking 
org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest$$Lambda$301/390135366.run
 in VM 0 running on Host 
heavy-lifter-993df0f4-3655-560f-82a7-0c09d04efdd9.c.apachegeode-ci.internal 
with 4 VMs
at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
at org.apache.geode.test.dunit.VM.invoke(VM.java:448)
at 
org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts(DistributedSystemMXBeanWithAlertsDistributedTest.java:379)

Caused by:
org.mockito.exceptions.verification.TooFewActualInvocations: 
notificationListener.handleNotification(
,
isNull()
);
Wanted 3 times:
-> at 
org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439)
But was 2 times:
-> at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754)
-> at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754)
at 
org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439)
at 
org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllAlerts(DistributedSystemMXBeanWithAlertsDistributedTest.java:451)
at 
org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.lambda$managerMissesAnyAlertsBeforeItStarts$bb17a952$6(DistributedSystemMXBeanWithAlertsDistributedTest.java:380)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9842) CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed with AssertionFailedError

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447681#comment-17447681
 ] 

Geode Integration commented on GEODE-9842:
--

Seen in [distributed-test-openjdk8 
#170|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/170]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637426775/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637426775/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed 
> with AssertionFailedError
> -
>
> Key: GEODE-9842
> URL: https://issues.apache.org/jira/browse/GEODE-9842
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>
> {noformat}
> PartitionedRegionSingleHopDUnitTest > testMetadataContents FAILED
> org.opentest4j.AssertionFailedError: 
> Expecting value to be false but was true
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.internal.cache.PartitionedRegionSingleHopDUnitTest.testMetadataContents(PartitionedRegionSingleHopDUnitTest.java:272)
> {noformat}
> This issue might be related to GEODE-9617.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9842) CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed with AssertionFailedError

2021-11-22 Thread Kamilla Aslami (Jira)
Kamilla Aslami created GEODE-9842:
-

 Summary: CI failure: 
PartitionedRegionSingleHopDUnitTest.testMetadataContents failed with 
AssertionFailedError
 Key: GEODE-9842
 URL: https://issues.apache.org/jira/browse/GEODE-9842
 Project: Geode
  Issue Type: Bug
  Components: client/server
Affects Versions: 1.15.0
Reporter: Kamilla Aslami


{noformat}
PartitionedRegionSingleHopDUnitTest > testMetadataContents FAILED
org.opentest4j.AssertionFailedError: 
Expecting value to be false but was true
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.geode.internal.cache.PartitionedRegionSingleHopDUnitTest.testMetadataContents(PartitionedRegionSingleHopDUnitTest.java:272)
{noformat}
This issue might be related to GEODE-9617.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9737) CI failure in TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest

2021-11-22 Thread Wayne (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wayne updated GEODE-9737:
-
Labels: release-blocker  (was: )

> CI failure in 
> TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest
> --
>
> Key: GEODE-9737
> URL: https://issues.apache.org/jira/browse/GEODE-9737
> Project: Geode
>  Issue Type: Bug
>  Components: http session
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Assignee: Benjamin P Ross
>Priority: Major
>  Labels: release-blocker
> Attachments: gemfire.log
>
>
> {noformat}
> TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest
>  > test[0] FAILED
> java.lang.RuntimeException: Something very bad happened when trying to 
> start container 
> TOMCAT7_client-server_test0_1_dd13a1a6-effb-4430-8ccd-ee6c9142938c_
> at 
> org.apache.geode.session.tests.ContainerManager.startContainer(ContainerManager.java:82)
> at 
> org.apache.geode.session.tests.ContainerManager.startContainers(ContainerManager.java:93)
> at 
> org.apache.geode.session.tests.ContainerManager.startAllInactiveContainers(ContainerManager.java:101)
> at 
> org.apache.geode.session.tests.TomcatSessionBackwardsCompatibilityTestBase.doPutAndGetSessionOnAllClients(TomcatSessionBackwardsCompatibilityTestBase.java:187)
> at 
> org.apache.geode.session.tests.TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest.test(TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest.java:36)
> Caused by:
> java.lang.RuntimeException: Something very bad happened to this 
> container when starting. Check the cargo_logs folder for container logs.
> at 
> org.apache.geode.session.tests.ServerContainer.start(ServerContainer.java:220)
> at 
> org.apache.geode.session.tests.ContainerManager.startContainer(ContainerManager.java:79)
> ... 4 more
> Caused by:
> org.codehaus.cargo.container.ContainerException: Deployable 
> [http://localhost:26322/cargocpc/index.html] failed to finish deploying 
> within the timeout period [12]. The Deployable state is thus unknown.
> at 
> org.codehaus.cargo.container.spi.deployer.DeployerWatchdog.watch(DeployerWatchdog.java:111)
> at 
> org.codehaus.cargo.container.spi.AbstractLocalContainer.waitForCompletion(AbstractLocalContainer.java:387)
> at 
> org.codehaus.cargo.container.spi.AbstractLocalContainer.start(AbstractLocalContainer.java:234)
> at 
> org.apache.geode.session.tests.ServerContainer.start(ServerContainer.java:218)
> ... 5 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447672#comment-17447672
 ] 

ASF subversion and git services commented on GEODE-9838:


Commit 313fb24631ab16157452fc540b70d18a7aa1b10b in geode's branch 
refs/heads/feature/GEODE-9838 from zhouxh
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=313fb24 ]

GEODE-9838: Log key info for deserialization issue while index update


> Log key info for deserilation issue while index update 
> ---
>
> Key: GEODE-9838
> URL: https://issues.apache.org/jira/browse/GEODE-9838
> Project: Geode
>  Issue Type: Improvement
>  Components: querying
>Affects Versions: 1.15.0
>Reporter: Anilkumar Gingade
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>
> When there is issue in Index update (maintenance); the index is marked as 
> invalid. And warning is logged: 
> [warn 2021/11/11 07:39:28.215 CST pazrslsrv004  Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier 
> failed. The index is corrupted and marked as invalid.
> org.apache.geode.cache.query.internal.index.IMQException
> Adding "key" information in the log helps diagnosing the failure and adding 
> or removing the entry in question. 
> Code path IndexManager.java:
> void addIndexMapping(RegionEntry entry, IndexProtocol index) {
>   try {
> index.addIndexMapping(entry);
>   } catch (Exception exception) {
> index.markValid(false);
> setPRIndexAsInvalid((AbstractIndex) index);
> logger.warn(String.format(
> "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
> ((AbstractIndex) index).indexName), exception);
>   }
> }
> void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) {
> try {
>   index.removeIndexMapping(entry, opCode);
> } catch (Exception exception) {
>   index.markValid(false);
>   setPRIndexAsInvalid((AbstractIndex) index);
>   logger.warn(String.format(
>   "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
>   ((AbstractIndex) index).indexName), exception);
> }
>   }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-9838:
--
Labels: GeodeOperationAPI pull-request-available  (was: GeodeOperationAPI)

> Log key info for deserilation issue while index update 
> ---
>
> Key: GEODE-9838
> URL: https://issues.apache.org/jira/browse/GEODE-9838
> Project: Geode
>  Issue Type: Improvement
>  Components: querying
>Affects Versions: 1.15.0
>Reporter: Anilkumar Gingade
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>
> When there is issue in Index update (maintenance); the index is marked as 
> invalid. And warning is logged: 
> [warn 2021/11/11 07:39:28.215 CST pazrslsrv004  Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier 
> failed. The index is corrupted and marked as invalid.
> org.apache.geode.cache.query.internal.index.IMQException
> Adding "key" information in the log helps diagnosing the failure and adding 
> or removing the entry in question. 
> Code path IndexManager.java:
> void addIndexMapping(RegionEntry entry, IndexProtocol index) {
>   try {
> index.addIndexMapping(entry);
>   } catch (Exception exception) {
> index.markValid(false);
> setPRIndexAsInvalid((AbstractIndex) index);
> logger.warn(String.format(
> "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
> ((AbstractIndex) index).indexName), exception);
>   }
> }
> void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) {
> try {
>   index.removeIndexMapping(entry, opCode);
> } catch (Exception exception) {
>   index.markValid(false);
>   setPRIndexAsInvalid((AbstractIndex) index);
>   logger.warn(String.format(
>   "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
>   ((AbstractIndex) index).indexName), exception);
> }
>   }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9841) Move internal packages to conform to new internal package structure

2021-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-9841:
--
Labels: pull-request-available  (was: )

> Move internal packages to conform to new internal package structure
> ---
>
> Key: GEODE-9841
> URL: https://issues.apache.org/jira/browse/GEODE-9841
> Project: Geode
>  Issue Type: Improvement
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>  Labels: pull-request-available
>
> Both ClassLoader and Deployment are in the `internal.classloader` and 
> `internal.deployment` package structure, which would make more sense (and 
> conform to newer thinking) to be `classloader.internal` and 
> `deployment.internal`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9841) Move internal packages to conform to new internal package structure

2021-11-22 Thread Udo Kohlmeyer (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udo Kohlmeyer reassigned GEODE-9841:


Assignee: Udo Kohlmeyer

> Move internal packages to conform to new internal package structure
> ---
>
> Key: GEODE-9841
> URL: https://issues.apache.org/jira/browse/GEODE-9841
> Project: Geode
>  Issue Type: Improvement
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>
> Both ClassLoader and Deployment are in the `internal.classloader` and 
> `internal.deployment` package structure, which would make more sense (and 
> conform to newer thinking) to be `classloader.internal` and 
> `deployment.internal`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9841) Move internal packages to conform to new internal package structure

2021-11-22 Thread Udo Kohlmeyer (Jira)
Udo Kohlmeyer created GEODE-9841:


 Summary: Move internal packages to conform to new internal package 
structure
 Key: GEODE-9841
 URL: https://issues.apache.org/jira/browse/GEODE-9841
 Project: Geode
  Issue Type: Improvement
Reporter: Udo Kohlmeyer


Both ClassLoader and Deployment are in the `internal.classloader` and 
`internal.deployment` package structure, which would make more sense (and 
conform to newer thinking) to be `classloader.internal` and 
`deployment.internal`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447642#comment-17447642
 ] 

Geode Integration commented on GEODE-8644:
--

Seen in [distributed-test-openjdk8 
#21|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/distributed-test-openjdk8/builds/21]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/distributedTest/1637612413/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-artifacts/1637612413/distributedtestfiles-openjdk8-1.15.0-build.0685.tgz].

> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> intermittently fails when queues drain too slowly
> ---
>
> Key: GEODE-8644
> URL: https://issues.apache.org/jira/browse/GEODE-8644
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Benjamin P Ross
>Assignee: Mark Hanson
>Priority: Major
>  Labels: GeodeOperationAPI, needsTriage, pull-request-available
>
> Currently the test 
> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> relies on a 2 second delay to allow for queues to finish draining after 
> finishing the put operation. If queues take longer than 2 seconds to drain 
> the test will fail. We should change the test to wait for the queues to be 
> empty with a long timeout in case the queues never fully drain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447641#comment-17447641
 ] 

Geode Integration commented on GEODE-7739:
--

Seen in [distributed-test-openjdk8 
#22|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/distributed-test-openjdk8/builds/22]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0686/test-results/distributedTest/1637613716/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0686/test-artifacts/1637613716/distributedtestfiles-openjdk8-1.15.0-build.0686.tgz].

> JMX managers may fail to federate mbeans for other members
> --
>
> Key: GEODE-7739
> URL: https://issues.apache.org/jira/browse/GEODE-7739
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> 
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> 
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out

2021-11-22 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9764:

Description: 
There is a weakness in the P2P/DirectChannel messaging architecture, in that it 
never gives up on a request (in a request-response scenario). As a result a bug 
(software fault) anywhere from the point where the requesting thread hands off 
the {{DistributionMessage}} e.g. to 
{{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
point where that request is ultimately fulfilled on a (one) receiver, can 
result in a hang (of some task on the send side, which is waiting for a 
response).

Well it's a little worse than that because any code in the return (response) 
path can also cause disruption of the (response) flow, thereby leaving the 
requesting task hanging.

If the code in the request path (primarily in P2P messaging) and the code in 
the response path (P2P messaging and TBD higher-level code) were perfect this 
might not be a problem. But there is a fair amount of code there and we have 
some evidence that it is currently not perfect, nor do we expect it to become 
perfect and stay that way.

This is a sketch of the situation. The left-most column is the request path or 
the originating member. The middle column is the server-side of the 
request-response path. And the right-most column is the response path back on 
the originating member.

!image-2021-11-22-12-14-59-117.png!

You can see that Geode product code, JDK code, and hardware components all lie 
in the end-to-end request-response messaging path.

That being the case it seems prudent to institute response timeouts so that 
bugs of this sort (which disrupt request-response message flow) don't result in 
hangs.

It's TBD if we want to go a step further and institute retries. The latter 
would entail introducing duplicate-suppression (conflation) in P2P messaging. 
We might also add exponential backoff (open-loop) or back-pressure 
(closed-loop) to prevent a flood of retries when the system is at or near the 
point of thrashing.

But even without retries, a configurable timeout might have good ROI as a first 
step. This would entail:
 * adding a configuration parameter to specify the timeout value
 * changing ReplyProcessor21 and others TBD to "give up" after the timeout has 
elapsed
 * changing higher-level code dependent on request-reply messaging so it 
properly handles the situations where we might have to "give up"

This issue affects all versions of Geode.
h2. Counterpoint

Not everybody thinks timeouts are a good idea. This section has the highlights.
h3. Timeouts Will Result in Data-Inconsistency

If we leave most the surrounding code as-is and introduce timeouts, then we 
risk data inconsistency. TODO: describe in detail why data inconsistency is 
_inherent_ in using timeouts.
h3. Narrow The Vulnerability Cross-Section Without Timeouts

The proposal (above) seeks to solve the problem using end-to-end timeouts since 
any component in the path can, in general, have faults. An alternative 
approach, would be to assume that _some_ of the components can be made "good 
enough" (without adding timeouts) and that those "good enough" components can 
protect themselves (and user applications) from faults in the remaining 
components.

With this approach, the Cluster Distribution Manager, and P2P / TCP Conduit / 
Direct Channel framework would be enhanced so that it was less susceptible to 
bugs in:
 * the 341 Distribution Message classes
 * the 68 Reply Message classes
 * the 95 Reply Processor classes

The question is: what form would that enhancement take, and also, would it be 
sufficient to overcome faults in remaining components (JDK, and the 
host+network layers).
h2. Alternatives Discussed

These alternatives have been discussed, to varying degrees.

Baseline: no timeouts; members waiting for replies do "the right thing" if 
recipient departs view

Give-up-after-timeout

Retry-after-timeout-and-eventually-give-up

Retry-after-forcing-receiver-out-of-view

  was:
There is a weakness in the P2P/DirectChannel messaging architecture, in that it 
never gives up on a request (in a request-response scenario). As a result a bug 
(software fault) anywhere from the point where the requesting thread hands off 
the {{DistributionMessage}} e.g. to 
{{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
point where that request is ultimately fulfilled on a (one) receiver, can 
result in a hang (of some task on the send side, which is waiting for a 
response).

Well it's a little worse than that because any code in the return (response) 
path can also cause disruption of the (response) flow, thereby leaving the 
requesting task hanging.

If the code in the request path (primarily in P2P messaging) and the code in 
the response path (P2P messaging and TBD higher-level code) were perfect this 
might 

[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out

2021-11-22 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9764:

Attachment: image-2021-11-22-12-14-59-117.png

> Request-Response Messaging Should Time Out
> --
>
> Key: GEODE-9764
> URL: https://issues.apache.org/jira/browse/GEODE-9764
> Project: Geode
>  Issue Type: Improvement
>  Components: messaging
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
> Attachments: image-2021-11-22-11-52-23-586.png, 
> image-2021-11-22-12-14-59-117.png
>
>
> There is a weakness in the P2P/DirectChannel messaging architecture, in that 
> it never gives up on a request (in a request-response scenario). As a result 
> a bug (software fault) anywhere from the point where the requesting thread 
> hands off the {{DistributionMessage}} e.g. to 
> {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
> point where that request is ultimately fulfilled on a (one) receiver, can 
> result in a hang (of some task on the send side, which is waiting for a 
> response).
> Well it's a little worse than that because any code in the return (response) 
> path can also cause disruption of the (response) flow, thereby leaving the 
> requesting task hanging.
> If the code in the request path (primarily in P2P messaging) and the code in 
> the response path (P2P messaging and TBD higher-level code) were perfect this 
> might not be a problem. But there is a fair amount of code there and we have 
> some evidence that it is currently not perfect, nor do we expect it to become 
> perfect and stay that way. That being the case it seems prudent to institute 
> response timeouts so that bugs of this sort (which disrupt request-response 
> message flow) don't result in hangs.
> It's TBD if we want to go a step further and institute retries. The latter 
> would entail introducing duplicate-suppression (conflation) in P2P messaging. 
> We might also add exponential backoff (open-loop) or back-pressure 
> (closed-loop) to prevent a flood of retries when the system is at or near the 
> point of thrashing.
> But even without retries, a configurable timeout might have good ROI as a 
> first step. This would entail:
>  * adding a configuration parameter to specify the timeout value
>  * changing ReplyProcessor21 and others TBD to "give up" after the timeout 
> has elapsed
>  * changing higher-level code dependent on request-reply messaging so it 
> properly handles the situations where we might have to "give up"
> This issue affects all versions of Geode.
> h2. Counterpoint
> Not everbody thinks timeouts are a good idea. Here are some alternative ideas:
>  
> Make request-response primitive better.  make it so only bugs in our core 
> messaging framework could cause a lack of response - rather than our current 
> approach where a bug in a class like “RemotePutMessage” could cause a lack of 
> a response.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out

2021-11-22 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9764:

Description: 
There is a weakness in the P2P/DirectChannel messaging architecture, in that it 
never gives up on a request (in a request-response scenario). As a result a bug 
(software fault) anywhere from the point where the requesting thread hands off 
the {{DistributionMessage}} e.g. to 
{{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
point where that request is ultimately fulfilled on a (one) receiver, can 
result in a hang (of some task on the send side, which is waiting for a 
response).

Well it's a little worse than that because any code in the return (response) 
path can also cause disruption of the (response) flow, thereby leaving the 
requesting task hanging.

If the code in the request path (primarily in P2P messaging) and the code in 
the response path (P2P messaging and TBD higher-level code) were perfect this 
might not be a problem. But there is a fair amount of code there and we have 
some evidence that it is currently not perfect, nor do we expect it to become 
perfect and stay that way.

This is a sketch of the situation. The left-most column is the request path or 
the originating member. The middle column is the server-side of the 
request-response path. And the right-most column is the response path back on 
the originating member.

!image-2021-11-22-12-14-59-117.png!

You can see that Geode product code, JDK code, and hardware components all lie 
in the end-to-end request-response messaging path.

That being the case it seems prudent to institute response timeouts so that 
bugs of this sort (which disrupt request-response message flow) don't result in 
hangs.

It's TBD if we want to go a step further and institute retries. The latter 
would entail introducing duplicate-suppression (conflation) in P2P messaging. 
We might also add exponential backoff (open-loop) or back-pressure 
(closed-loop) to prevent a flood of retries when the system is at or near the 
point of thrashing.

But even without retries, a configurable timeout might have good ROI as a first 
step. This would entail:
 * adding a configuration parameter to specify the timeout value
 * changing ReplyProcessor21 and others TBD to "give up" after the timeout has 
elapsed
 * changing higher-level code dependent on request-reply messaging so it 
properly handles the situations where we might have to "give up"

This issue affects all versions of Geode.
h2. Counterpoint

Not everybody thinks timeouts are a good idea. Here are some alternative ideas:

The proposal (above) seeks to solve the problem using end-to-end timeouts since 
any component in the path can, in general, have faults. An alternative 
approach, would be to assume that _some_ of the components can be made "good 
enough" (without adding timeouts) and that those "good enough" components can 
protect themselves (and user applications) from faults in the remaining 
components.

With this approach, the Cluster Distribution Manager, and P2P / TCP Conduit / 
Direct Channel framework would be enhanced so that it was less susceptible to 
bugs in:
 * the 341 Distribution Message classes
 * the 68 Reply Message classes
 * the 95 Reply Processor classes

The question is: what form would that enhancement take, and also, would it be 
sufficient to overcome faults in remaining components (JDK, and the 
host+network layers).

 

  was:
There is a weakness in the P2P/DirectChannel messaging architecture, in that it 
never gives up on a request (in a request-response scenario). As a result a bug 
(software fault) anywhere from the point where the requesting thread hands off 
the {{DistributionMessage}} e.g. to 
{{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
point where that request is ultimately fulfilled on a (one) receiver, can 
result in a hang (of some task on the send side, which is waiting for a 
response).

Well it's a little worse than that because any code in the return (response) 
path can also cause disruption of the (response) flow, thereby leaving the 
requesting task hanging.

If the code in the request path (primarily in P2P messaging) and the code in 
the response path (P2P messaging and TBD higher-level code) were perfect this 
might not be a problem. But there is a fair amount of code there and we have 
some evidence that it is currently not perfect, nor do we expect it to become 
perfect and stay that way. That being the case it seems prudent to institute 
response timeouts so that bugs of this sort (which disrupt request-response 
message flow) don't result in hangs.

It's TBD if we want to go a step further and institute retries. The latter 
would entail introducing duplicate-suppression (conflation) in P2P messaging. 
We might also add exponential backoff (open-loop) or back-pressure 
(closed-loop) to 

[jira] [Commented] (GEODE-7822) MemoryThresholdsOffHeapDUnitTest has failures

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447599#comment-17447599
 ] 

Geode Integration commented on GEODE-7822:
--

Seen in [distributed-test-openjdk8 
#118|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/118]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637385064/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637385064/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> MemoryThresholdsOffHeapDUnitTest has failures
> -
>
> Key: GEODE-7822
> URL: https://issues.apache.org/jira/browse/GEODE-7822
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Mark Hanson
>Priority: Major
>  Labels: flaky
>
> These two failures were seen in mass test runs...
> {noformat}
>  testPRLoadRejection   
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/674
> {noformat}
> {noformat}
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest > 
> testPRLoadRejection FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$31.call in 
> VM 2 running on Host a57bd8581b8d with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:610)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:462)
> at 
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest.testPRLoadRejection(MemoryThresholdsOffHeapDUnitTest.java:1045)
> Caused by:
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertFalse(Assert.java:64)
> at org.junit.Assert.assertFalse(Assert.java:74)
> at 
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$31.call(MemoryThresholdsOffHeapDUnitTest.java:1077){noformat}
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> [http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-results/distributedTest/1582515952/]
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> [http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-artifacts/1582515952/distributedtestfiles-OpenJDK8-1.12.0-SNAPSHOT.0005.tgz]
> {noformat}
>  testDRLoadRejection   
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/742
>  {noformat}
> {noformat}
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest > 
> testDRLoadRejection FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$18.call in 
> VM 2 running on Host b2c673017cde with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:610)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:462)
> at 
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest.testDRLoadRejection(MemoryThresholdsOffHeapDUnitTest.java:667)
> Caused by:
>java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertFalse(Assert.java:64)
> at org.junit.Assert.assertFalse(Assert.java:74)
> at 
> org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$18.call(MemoryThresholdsOffHeapDUnitTest.java:673)
>  {noformat}
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-results/distributedTest/1582626992/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-artifacts/1582626992/distributedtestfiles-OpenJDK8-1.12.0-SNAPSHOT.0005.tgz



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447595#comment-17447595
 ] 

Geode Integration commented on GEODE-8644:
--

Seen in [distributed-test-openjdk8 
#144|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/144]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637410069/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637410069/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> intermittently fails when queues drain too slowly
> ---
>
> Key: GEODE-8644
> URL: https://issues.apache.org/jira/browse/GEODE-8644
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Benjamin P Ross
>Assignee: Mark Hanson
>Priority: Major
>  Labels: GeodeOperationAPI, needsTriage, pull-request-available
>
> Currently the test 
> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> relies on a 2 second delay to allow for queues to finish draining after 
> finishing the put operation. If queues take longer than 2 seconds to drain 
> the test will fail. We should change the test to wait for the queues to be 
> empty with a long timeout in case the queues never fully drain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447594#comment-17447594
 ] 

Geode Integration commented on GEODE-8644:
--

Seen in [distributed-test-openjdk8 
#154|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/154]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637417234/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637417234/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> intermittently fails when queues drain too slowly
> ---
>
> Key: GEODE-8644
> URL: https://issues.apache.org/jira/browse/GEODE-8644
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Benjamin P Ross
>Assignee: Mark Hanson
>Priority: Major
>  Labels: GeodeOperationAPI, needsTriage, pull-request-available
>
> Currently the test 
> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> relies on a 2 second delay to allow for queues to finish draining after 
> finishing the put operation. If queues take longer than 2 seconds to drain 
> the test will fail. We should change the test to wait for the queues to be 
> empty with a long timeout in case the queues never fully drain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out

2021-11-22 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9764:

Attachment: image-2021-11-22-11-52-23-586.png

> Request-Response Messaging Should Time Out
> --
>
> Key: GEODE-9764
> URL: https://issues.apache.org/jira/browse/GEODE-9764
> Project: Geode
>  Issue Type: Improvement
>  Components: messaging
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
> Attachments: image-2021-11-22-11-52-23-586.png
>
>
> There is a weakness in the P2P/DirectChannel messaging architecture, in that 
> it never gives up on a request (in a request-response scenario). As a result 
> a bug (software fault) anywhere from the point where the requesting thread 
> hands off the {{DistributionMessage}} e.g. to 
> {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
> point where that request is ultimately fulfilled on a (one) receiver, can 
> result in a hang (of some task on the send side, which is waiting for a 
> response).
> Well it's a little worse than that because any code in the return (response) 
> path can also cause disruption of the (response) flow, thereby leaving the 
> requesting task hanging.
> If the code in the request path (primarily in P2P messaging) and the code in 
> the response path (P2P messaging and TBD higher-level code) were perfect this 
> might not be a problem. But there is a fair amount of code there and we have 
> some evidence that it is currently not perfect, nor do we expect it to become 
> perfect and stay that way. That being the case it seems prudent to institute 
> response timeouts so that bugs of this sort (which disrupt request-response 
> message flow) don't result in hangs.
> It's TBD if we want to go a step further and institute retries. The latter 
> would entail introducing duplicate-suppression (conflation) in P2P messaging. 
> We might also add exponential backoff (open-loop) or back-pressure 
> (closed-loop) to prevent a flood of retries when the system is at or near the 
> point of thrashing.
> But even without retries, a configurable timeout might have good ROI as a 
> first step. This would entail:
>  * adding a configuration parameter to specify the timeout value
>  * changing ReplyProcessor21 and others TBD to "give up" after the timeout 
> has elapsed
>  * changing higher-level code dependent on request-reply messaging so it 
> properly handles the situations where we might have to "give up"
> This issue affects all versions of Geode.
> h2. Counterpoint
> Not everbody thinks timeouts are a good idea. Here are some alternative ideas:
>  
> Make request-response primitive better.  make it so only bugs in our core 
> messaging framework could cause a lack of response - rather than our current 
> approach where a bug in a class like “RemotePutMessage” could cause a lack of 
> a response.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447587#comment-17447587
 ] 

Geode Integration commented on GEODE-7739:
--

Seen in [distributed-test-openjdk8 
#178|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/178]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637434380/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637434380/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> JMX managers may fail to federate mbeans for other members
> --
>
> Key: GEODE-7739
> URL: https://issues.apache.org/jira/browse/GEODE-7739
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> 
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> 
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out

2021-11-22 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9764:

Description: 
There is a weakness in the P2P/DirectChannel messaging architecture, in that it 
never gives up on a request (in a request-response scenario). As a result a bug 
(software fault) anywhere from the point where the requesting thread hands off 
the {{DistributionMessage}} e.g. to 
{{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the 
point where that request is ultimately fulfilled on a (one) receiver, can 
result in a hang (of some task on the send side, which is waiting for a 
response).

Well it's a little worse than that because any code in the return (response) 
path can also cause disruption of the (response) flow, thereby leaving the 
requesting task hanging.

If the code in the request path (primarily in P2P messaging) and the code in 
the response path (P2P messaging and TBD higher-level code) were perfect this 
might not be a problem. But there is a fair amount of code there and we have 
some evidence that it is currently not perfect, nor do we expect it to become 
perfect and stay that way. That being the case it seems prudent to institute 
response timeouts so that bugs of this sort (which disrupt request-response 
message flow) don't result in hangs.

It's TBD if we want to go a step further and institute retries. The latter 
would entail introducing duplicate-suppression (conflation) in P2P messaging. 
We might also add exponential backoff (open-loop) or back-pressure 
(closed-loop) to prevent a flood of retries when the system is at or near the 
point of thrashing.

But even without retries, a configurable timeout might have good ROI as a first 
step. This would entail:
 * adding a configuration parameter to specify the timeout value
 * changing ReplyProcessor21 and others TBD to "give up" after the timeout has 
elapsed
 * changing higher-level code dependent on request-reply messaging so it 
properly handles the situations where we might have to "give up"

This issue affects all versions of Geode.
h2. Counterpoint

Not everbody thinks timeouts are a good idea. Here are some alternative ideas:

 

Make request-response primitive better.  make it so only bugs in our core 
messaging framework could cause a lack of response - rather than our current 
approach where a bug in a class like “RemotePutMessage” could cause a lack of a 
response.

  was:
There is a weakness in the P2P/DirectChannel messaging architecture, in that it 
never gives up on a request (in a request-response scenario). As a result a bug 
(software fault) anywhere from the point where the requesting thread hands off 
the {{DistributionMessage}} e.g. to 
{{ClusterDistributionManager.putOutgoing(DistributionMessage)}}, to the point 
where that request is ultimately fulfilled on a (one) receiver, can result in a 
hang (of some task on the send side, which is waiting for a response).

Well it's a little worse than that because any code in the return (response) 
path can also cause disruption of the (response) flow, thereby leaving the 
requesting task hanging.

If the code in the request path (primarily in P2P messaging) and the code in 
the response path (P2P messaging and TBD higher-level code) were perfect this 
might not be a problem. But there is a fair amount of code there and we have 
some evidence that it is currently not perfect, nor do we expect it to become 
perfect and stay that way. That being the case it seems prudent to institute 
response timeouts so that bugs of this sort (which disrupt request-response 
message flow) don't result in hangs.

It's TBD if we want to go a step further and institute retries. The latter 
would entail introducing duplicate-suppression (conflation) in P2P messaging. 
We might also add exponential backoff (open-loop) or back-pressure 
(closed-loop) to prevent a flood of retries when the system is at or near the 
point of thrashing.

But even without retries, a configurable timeout might have good ROI as a first 
step. This would entail:

* adding a configuration parameter to specify the timeout value
* changing ReplyProcessor21 and others TBD to "give up" after the timeout has 
elapsed
* changing higher-level code dependent on request-reply messaging so it 
properly handles the situations where we might have to "give up"

This issue affects all versions of Geode.


> Request-Response Messaging Should Time Out
> --
>
> Key: GEODE-9764
> URL: https://issues.apache.org/jira/browse/GEODE-9764
> Project: Geode
>  Issue Type: Improvement
>  Components: messaging
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>
> There is a weakness in the P2P/DirectChannel messaging architecture, in that 
> it never gives up on a 

[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly

2021-11-22 Thread Xiaojian Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447573#comment-17447573
 ] 

Xiaojian Zhou commented on GEODE-8644:
--

The reproduced failures are all caused by locator disconnected:

[locator] [info 2021/11/13 08:43:12.331 UTC   
tid=0x46] Failed to connect to localhost/127.0.0.1:0

[locator] [warn 2021/11/13 08:43:12.331 UTC   
tid=0x46] Locator discovery task for locator 
heavy-lifter-ca6688de-b95d-5db6-9ac5-57db242f6302.c.apachegeode-ci.internal[34223]
 could not exchange locator information with localhost[0] after 45 retry 
attempts. Retrying in 1 ms.


> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> intermittently fails when queues drain too slowly
> ---
>
> Key: GEODE-8644
> URL: https://issues.apache.org/jira/browse/GEODE-8644
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Benjamin P Ross
>Assignee: Mark Hanson
>Priority: Major
>  Labels: GeodeOperationAPI, needsTriage, pull-request-available
>
> Currently the test 
> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> relies on a 2 second delay to allow for queues to finish draining after 
> finishing the put operation. If queues take longer than 2 seconds to drain 
> the test will fail. We should change the test to wait for the queues to be 
> empty with a long timeout in case the queues never fully drain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9770) CI Failure: ConflictingPersistentDataException in PersistentRecoveryOrderDUnitTest > testRecoverAfterConflict

2021-11-22 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447572#comment-17447572
 ] 

Geode Integration commented on GEODE-9770:
--

Seen in [distributed-test-openjdk8 
#196|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/196]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637450065/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637450065/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz].

> CI Failure: ConflictingPersistentDataException in 
> PersistentRecoveryOrderDUnitTest > testRecoverAfterConflict
> -
>
> Key: GEODE-9770
> URL: https://issues.apache.org/jira/browse/GEODE-9770
> Project: Geode
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 1.15.0
>Reporter: Nabarun Nag
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, needsTriage
>
> This ConflictingPersistentDataException has popped up multiple number of 
> times.
> GEODE-6975
> GEODE-7898
>  
> {noformat}
> PersistentRecoveryOrderDUnitTest > testRecoverAfterConflict FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest$$Lambda$477/1255368072.run
>  in VM 0 running on Host 
> heavy-lifter-7860ae84-3be2-5775-9a40-47a7abc4e64d.c.apachegeode-ci.internal 
> with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:448)
> at 
> org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.testRecoverAfterConflict(PersistentRecoveryOrderDUnitTest.java:1328)
> Caused by:
> org.apache.geode.cache.CacheClosedException: Region 
> /PersistentRecoveryOrderDUnitTest_testRecoverAfterConflictRegion remote 
> member heavy-lifter-7860ae84-3be2-5775-9a40-47a7abc4e64d(585689):51002 
> with persistent data 
> /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-1
>  created at timestamp 1635009815552 version 0 diskStoreId 
> bf4774f44f2e4dcd-aa6c79424132a2e4 name  was not part of the same distributed 
> system as the local data from 
> /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-0
>  created at timestamp 1635009814986 version 0 diskStoreId 
> cc4c64d81e9d4119-9e7320b29f540199 name , caused by 
> org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region 
> /PersistentRecoveryOrderDUnitTest_testRecoverAfterConflictRegion remote 
> member heavy-lifter-7860ae84-3be2-5775-9a40-47a7abc4e64d(585689):51002 
> with persistent data 
> /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-1
>  created at timestamp 1635009815552 version 0 diskStoreId 
> bf4774f44f2e4dcd-aa6c79424132a2e4 name  was not part of the same distributed 
> system as the local data from 
> /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-0
>  created at timestamp 1635009814986 version 0 diskStoreId 
> cc4c64d81e9d4119-9e7320b29f540199 name 
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5223)
> at 
> org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.getInternalResourceManager(GemFireCacheImpl.java:4259)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.getInternalResourceManager(GemFireCacheImpl.java:4253)
> at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1175)
> at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3002)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2986)
> at 
> org.apache.geode.cache.RegionFactory.create(RegionFactory.java:773)
> at 
> org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:75)
> at 
> org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.createReplicateRegion(PersistentRecoveryOrderDUnitTest.java:1358)
> at 

[jira] [Updated] (GEODE-9819) Client socket leak in CacheClientNotifier.registerClientInternal when error conditions occur for the durable client

2021-11-22 Thread Darrel Schneider (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider updated GEODE-9819:

Affects Version/s: 1.13.4
   1.12.5

> Client socket leak in CacheClientNotifier.registerClientInternal when error 
> conditions occur for the durable client
> ---
>
> Key: GEODE-9819
> URL: https://issues.apache.org/jira/browse/GEODE-9819
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, core
>Affects Versions: 1.12.5, 1.13.4, 1.14.0, 1.15.0
>Reporter: Leon Finker
>Priority: Critical
>  Labels: needsTriage
>
> In CacheClientNotifier.registerClientInternal client socket can be left half 
> open and not properly closed when error conditions occur. Such as the case of:
> {code:java}
> } else {
>   // The existing proxy is already running (which means that another
>   // client is already using this durable id.
>   unsuccessfulMsg =
>   String.format(
>   "The requested durable client has the same identifier ( %s ) as an 
> existing durable client ( %s ). Duplicate durable clients are not allowed.",
>   clientProxyMembershipID.getDurableId(), cacheClientProxy);
>   logger.warn(unsuccessfulMsg);
>   // Set the unsuccessful response byte.
>   responseByte = Handshake.REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT;
> } {code}
> It considers the current client connect attempt to have failed. It writes 
> this response back to client: REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This 
> will cause the client to throw ServerRefusedConnectionException. What seems 
> wrong about this method is that even though it sets "unsuccessfulMsg" and 
> correctly sends back a handshake saying the client is rejected, it does not 
> throw an exception and it does not close "socket". I think right before it 
> calls performPostAuthorization it should do the followiing:
> {code:java}
> if (unsuccessfulMsg != null) {
>   try {
> socket.close();
>   } catch (IOException ignore) {
>   }
>  } else {
> performPostAuthorization(...)
> }{code}
> Full discussion details can be found at 
> https://markmail.org/thread/2gqmbq2m57pz7pxu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-1537) DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA

2021-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447554#comment-17447554
 ] 

ASF subversion and git services commented on GEODE-1537:


Commit 5ec6a663b44f95d3a23dd7f040012f8b29d54701 in geode's branch 
refs/heads/develop from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=5ec6a66 ]

GEODE-1537: Re-order ephemeral port acquisition to fix flaky 
DurableRegistrationDUnitTest (#7111)



> DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA
> 
>
> Key: GEODE-1537
> URL: https://issues.apache.org/jira/browse/GEODE-1537
> Project: Geode
>  Issue Type: Bug
>  Components: client queues
>Reporter: Jinmei Liao
>Assignee: Jens Deppe
>Priority: Major
>  Labels: CI, pull-request-available
> Fix For: 1.15.0
>
>
> Geode_develop_DistributedTests/2883
> Error Message
> com.gemstone.gemfire.test.dunit.RMIException: While invoking 
> com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run
>  in VM 1 running on Host timor.gemstone.com with 4 VMs
> Stacktrace
> com.gemstone.gemfire.test.dunit.RMIException: While invoking 
> com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run
>  in VM 1 running on Host timor.gemstone.com with 4 VMs
>   at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:389)
>   at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:355)
>   at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:293)
>   at 
> com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA(DurableRegistrationDUnitTest.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:112)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor426.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at 

[jira] [Commented] (GEODE-9764) Request-Response Messaging Should Time Out

2021-11-22 Thread Anthony Baker (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447553#comment-17447553
 ] 

Anthony Baker commented on GEODE-9764:
--

I would argue that for certain messages like replication of values, timeouts 
alone are insufficient.  To maintain consistency, we have to replicate the 
change or revert it. I think that implies the need for timeouts as well failure 
detection improvements.

> Request-Response Messaging Should Time Out
> --
>
> Key: GEODE-9764
> URL: https://issues.apache.org/jira/browse/GEODE-9764
> Project: Geode
>  Issue Type: Improvement
>  Components: messaging
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>
> There is a weakness in the P2P/DirectChannel messaging architecture, in that 
> it never gives up on a request (in a request-response scenario). As a result 
> a bug (software fault) anywhere from the point where the requesting thread 
> hands off the {{DistributionMessage}} e.g. to 
> {{ClusterDistributionManager.putOutgoing(DistributionMessage)}}, to the point 
> where that request is ultimately fulfilled on a (one) receiver, can result in 
> a hang (of some task on the send side, which is waiting for a response).
> Well it's a little worse than that because any code in the return (response) 
> path can also cause disruption of the (response) flow, thereby leaving the 
> requesting task hanging.
> If the code in the request path (primarily in P2P messaging) and the code in 
> the response path (P2P messaging and TBD higher-level code) were perfect this 
> might not be a problem. But there is a fair amount of code there and we have 
> some evidence that it is currently not perfect, nor do we expect it to become 
> perfect and stay that way. That being the case it seems prudent to institute 
> response timeouts so that bugs of this sort (which disrupt request-response 
> message flow) don't result in hangs.
> It's TBD if we want to go a step further and institute retries. The latter 
> would entail introducing duplicate-suppression (conflation) in P2P messaging. 
> We might also add exponential backoff (open-loop) or back-pressure 
> (closed-loop) to prevent a flood of retries when the system is at or near the 
> point of thrashing.
> But even without retries, a configurable timeout might have good ROI as a 
> first step. This would entail:
> * adding a configuration parameter to specify the timeout value
> * changing ReplyProcessor21 and others TBD to "give up" after the timeout has 
> elapsed
> * changing higher-level code dependent on request-reply messaging so it 
> properly handles the situations where we might have to "give up"
> This issue affects all versions of Geode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-1537) DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA

2021-11-22 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe resolved GEODE-1537.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA
> 
>
> Key: GEODE-1537
> URL: https://issues.apache.org/jira/browse/GEODE-1537
> Project: Geode
>  Issue Type: Bug
>  Components: client queues
>Reporter: Jinmei Liao
>Assignee: Jens Deppe
>Priority: Major
>  Labels: CI, pull-request-available
> Fix For: 1.15.0
>
>
> Geode_develop_DistributedTests/2883
> Error Message
> com.gemstone.gemfire.test.dunit.RMIException: While invoking 
> com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run
>  in VM 1 running on Host timor.gemstone.com with 4 VMs
> Stacktrace
> com.gemstone.gemfire.test.dunit.RMIException: While invoking 
> com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run
>  in VM 1 running on Host timor.gemstone.com with 4 VMs
>   at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:389)
>   at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:355)
>   at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:293)
>   at 
> com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA(DurableRegistrationDUnitTest.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:112)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor426.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.GeneratedMethodAccessor425.invoke(Unknown Source)
>   at 
> 

[jira] [Updated] (GEODE-9819) Client socket leak in CacheClientNotifier.registerClientInternal when error conditions occur for the durable client

2021-11-22 Thread Darrel Schneider (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider updated GEODE-9819:

Labels: needsTriage  (was: )

> Client socket leak in CacheClientNotifier.registerClientInternal when error 
> conditions occur for the durable client
> ---
>
> Key: GEODE-9819
> URL: https://issues.apache.org/jira/browse/GEODE-9819
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, core
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Leon Finker
>Priority: Critical
>  Labels: needsTriage
>
> In CacheClientNotifier.registerClientInternal client socket can be left half 
> open and not properly closed when error conditions occur. Such as the case of:
> {code:java}
> } else {
>   // The existing proxy is already running (which means that another
>   // client is already using this durable id.
>   unsuccessfulMsg =
>   String.format(
>   "The requested durable client has the same identifier ( %s ) as an 
> existing durable client ( %s ). Duplicate durable clients are not allowed.",
>   clientProxyMembershipID.getDurableId(), cacheClientProxy);
>   logger.warn(unsuccessfulMsg);
>   // Set the unsuccessful response byte.
>   responseByte = Handshake.REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT;
> } {code}
> It considers the current client connect attempt to have failed. It writes 
> this response back to client: REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This 
> will cause the client to throw ServerRefusedConnectionException. What seems 
> wrong about this method is that even though it sets "unsuccessfulMsg" and 
> correctly sends back a handshake saying the client is rejected, it does not 
> throw an exception and it does not close "socket". I think right before it 
> calls performPostAuthorization it should do the followiing:
> {code:java}
> if (unsuccessfulMsg != null) {
>   try {
> socket.close();
>   } catch (IOException ignore) {
>   }
>  } else {
> performPostAuthorization(...)
> }{code}
> Full discussion details can be found at 
> https://markmail.org/thread/2gqmbq2m57pz7pxu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9819) Client socket leak in CacheClientNotifier.registerClientInternal when error conditions occur for the durable client

2021-11-22 Thread Darrel Schneider (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider updated GEODE-9819:

Affects Version/s: 1.15.0

> Client socket leak in CacheClientNotifier.registerClientInternal when error 
> conditions occur for the durable client
> ---
>
> Key: GEODE-9819
> URL: https://issues.apache.org/jira/browse/GEODE-9819
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, core
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Leon Finker
>Priority: Critical
>
> In CacheClientNotifier.registerClientInternal client socket can be left half 
> open and not properly closed when error conditions occur. Such as the case of:
> {code:java}
> } else {
>   // The existing proxy is already running (which means that another
>   // client is already using this durable id.
>   unsuccessfulMsg =
>   String.format(
>   "The requested durable client has the same identifier ( %s ) as an 
> existing durable client ( %s ). Duplicate durable clients are not allowed.",
>   clientProxyMembershipID.getDurableId(), cacheClientProxy);
>   logger.warn(unsuccessfulMsg);
>   // Set the unsuccessful response byte.
>   responseByte = Handshake.REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT;
> } {code}
> It considers the current client connect attempt to have failed. It writes 
> this response back to client: REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This 
> will cause the client to throw ServerRefusedConnectionException. What seems 
> wrong about this method is that even though it sets "unsuccessfulMsg" and 
> correctly sends back a handshake saying the client is rejected, it does not 
> throw an exception and it does not close "socket". I think right before it 
> calls performPostAuthorization it should do the followiing:
> {code:java}
> if (unsuccessfulMsg != null) {
>   try {
> socket.close();
>   } catch (IOException ignore) {
>   }
>  } else {
> performPostAuthorization(...)
> }{code}
> Full discussion details can be found at 
> https://markmail.org/thread/2gqmbq2m57pz7pxu



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9820) stopCQ does not trigger re-authentication

2021-11-22 Thread Jinmei Liao (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinmei Liao resolved GEODE-9820.

Fix Version/s: 1.15.0
   Resolution: Fixed

> stopCQ does not trigger re-authentication
> -
>
> Key: GEODE-9820
> URL: https://issues.apache.org/jira/browse/GEODE-9820
> Project: Geode
>  Issue Type: Sub-task
>  Components: cq
>Affects Versions: 1.14.0
>Reporter: Jinmei Liao
>Assignee: Jinmei Liao
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
>
> after credential expires, when user execute `stopCQ` operation, 
> re-authentication does not get triggered.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9820) stopCQ does not trigger re-authentication

2021-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447549#comment-17447549
 ] 

ASF subversion and git services commented on GEODE-9820:


Commit f61e32f566761e61cc12cd9b32e2ceaa05ccdc72 in geode's branch 
refs/heads/develop from Jinmei Liao
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=f61e32f ]

GEODE-9820: stopCQ should handle general exception same way as ExecuteCQ61 
(#7122)



> stopCQ does not trigger re-authentication
> -
>
> Key: GEODE-9820
> URL: https://issues.apache.org/jira/browse/GEODE-9820
> Project: Geode
>  Issue Type: Sub-task
>  Components: cq
>Affects Versions: 1.14.0
>Reporter: Jinmei Liao
>Assignee: Jinmei Liao
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>
> after credential expires, when user execute `stopCQ` operation, 
> re-authentication does not get triggered.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread Xiaojian Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaojian Zhou reassigned GEODE-9838:


Assignee: Xiaojian Zhou

> Log key info for deserilation issue while index update 
> ---
>
> Key: GEODE-9838
> URL: https://issues.apache.org/jira/browse/GEODE-9838
> Project: Geode
>  Issue Type: Improvement
>  Components: querying
>Affects Versions: 1.15.0
>Reporter: Anilkumar Gingade
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>
> When there is issue in Index update (maintenance); the index is marked as 
> invalid. And warning is logged: 
> [warn 2021/11/11 07:39:28.215 CST pazrslsrv004  Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier 
> failed. The index is corrupted and marked as invalid.
> org.apache.geode.cache.query.internal.index.IMQException
> Adding "key" information in the log helps diagnosing the failure and adding 
> or removing the entry in question. 
> Code path IndexManager.java:
> void addIndexMapping(RegionEntry entry, IndexProtocol index) {
>   try {
> index.addIndexMapping(entry);
>   } catch (Exception exception) {
> index.markValid(false);
> setPRIndexAsInvalid((AbstractIndex) index);
> logger.warn(String.format(
> "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
> ((AbstractIndex) index).indexName), exception);
>   }
> }
> void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) {
> try {
>   index.removeIndexMapping(entry, opCode);
> } catch (Exception exception) {
>   index.markValid(false);
>   setPRIndexAsInvalid((AbstractIndex) index);
>   logger.warn(String.format(
>   "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
>   ((AbstractIndex) index).indexName), exception);
> }
>   }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9840) Improve Redundancy Level Log Message

2021-11-22 Thread Wayne (Jira)
Wayne created GEODE-9840:


 Summary: Improve Redundancy Level Log Message
 Key: GEODE-9840
 URL: https://issues.apache.org/jira/browse/GEODE-9840
 Project: Geode
  Issue Type: New Feature
  Components: redis
Reporter: Wayne


The current log message 
Configured redundancy of 2 copies has been restored to /GEODE_FOR_REDIS
 

is confusing and not intuitive.  This message should be changed to the 
following:

"Configured redundancy of 2 copies has been restored to /region (1 primary and 
1 secondary copies)"?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9839) Ability to Set Log Level for Specific Package

2021-11-22 Thread Wayne (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wayne updated GEODE-9839:
-
Summary: Ability to Set Log Level for Specific Package  (was: Ability to 
Set Log Live for Specific Package)

> Ability to Set Log Level for Specific Package
> -
>
> Key: GEODE-9839
> URL: https://issues.apache.org/jira/browse/GEODE-9839
> Project: Geode
>  Issue Type: New Feature
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Wayne
>Priority: Major
>
> As a user of Geode, I would like the ability to use the alter runtime 
> --log-level command to set the logging level for a specific package.  This 
> would allow me to turn on debug level logging for just the redis packages, 
> org.apache.geode.redis.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9839) Ability to Set Log Live for Specific Package

2021-11-22 Thread Wayne (Jira)
Wayne created GEODE-9839:


 Summary: Ability to Set Log Live for Specific Package
 Key: GEODE-9839
 URL: https://issues.apache.org/jira/browse/GEODE-9839
 Project: Geode
  Issue Type: New Feature
  Components: redis
Affects Versions: 1.15.0
Reporter: Wayne


As a user of Geode, I would like the ability to use the alter runtime 
--log-level command to set the logging level for a specific package.  This 
would allow me to turn on debug level logging for just the redis packages, 
org.apache.geode.redis.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9825) Disparate socket-buffer-size Results in "IOException: Unknown header byte" and Hangs

2021-11-22 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham reassigned GEODE-9825:
---

Assignee: Bill Burcham

> Disparate socket-buffer-size Results in "IOException: Unknown header byte" 
> and Hangs
> 
>
> Key: GEODE-9825
> URL: https://issues.apache.org/jira/browse/GEODE-9825
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.12.4, 1.15.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
>
> GEODE-9141 introduced a bug that causes {{IOException: "Unknown header 
> byte..."}} and hangs if members are configured with different 
> {{socket-buffer-size}} settings.
> h2. Reproduction
> To reproduce this bug turn off TLS and set socket-buffer-size on sender to be 
> 64KB and set socket-buffer-size on receiver to be 32KB. See associated PR for 
> an example.
> h2. Analysis
> In {{{}Connection.processInputBuffer(){}}}. When that method has read all the 
> messages it can from the current input buffer, it then considers whether the 
> buffer needs expansion. If it does then:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code}
> Is executed and the method returns. The caller then expects to be able to 
> _write_ bytes into {{{}inputBuffer{}}}.
> The problem, it seems, is that 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave 
> the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be 
> _read_ not written.
> Before the changes for GEODE-9141 were introduced, the line of code 
> referenced above used to be this snippet in 
> {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has 
> since been removed):
> {code:java}
>      // need a bigger buffer
>     logger.info("Allocating larger network read buffer, new size is {} old 
> size was {}.",
>         allocSize, oldBufferSize);
>     ByteBuffer oldBuffer = inputBuffer;
>     inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize);    
> if (oldBuffer != null) {
>       int oldByteCount = oldBuffer.remaining();
>       inputBuffer.put(oldBuffer);
>       inputBuffer.position(oldByteCount);
>       getBufferPool().releaseReceiveBuffer(oldBuffer);
>     } {code}
> Notice how this method leaves {{inputBuffer}} ready to be _written_ to.
> But the code inside 
> {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing 
> something like:
> {code:java}
> newBuffer.clear();
> newBuffer.put(existing);
> newBuffer.flip();
> releaseBuffer(type, existing);
> return newBuffer; {code}
> A solution (shown in the associated PR) is to do add logic after the call to 
> {{expandReadBufferIfNeeded(allocSize)}} to leave the buffer in a _writeable_ 
> state:
> {code:java}
> inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize);
> // we're returning to the caller (done == true) so make buffer writeable
> inputBuffer.position(inputBuffer.limit());
> inputBuffer.limit(inputBuffer.capacity()); {code}
> h2. Resolution
> When this ticket is complete the bug will be fixed and 
> {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test at least these 
> combinations:
> [security, sender/locator socket-buffer-size, receiver socket-buffer-size]
> [TLS, (default), (default)]  this is what the test currently does
> [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug*
> [no TLS, (default), (default)] *new*
> We might want to mix in conserve-sockets true/false in there too while we're 
> at it (the test currently holds it at true).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-9838:
-
Description: 
When there is issue in Index update (maintenance); the index is marked as 
invalid. And warning is logged: 

[warn 2021/11/11 07:39:28.215 CST pazrslsrv004  tid=0x124ecf] Updating the Index patientMemberIdentifier failed. 
The index is corrupted and marked as invalid.
org.apache.geode.cache.query.internal.index.IMQException

Adding "key" information in the log helps diagnosing the failure and adding or 
removing the entry in question. 


Code path IndexManager.java:

void addIndexMapping(RegionEntry entry, IndexProtocol index) {
  try {
index.addIndexMapping(entry);
  } catch (Exception exception) {
index.markValid(false);
setPRIndexAsInvalid((AbstractIndex) index);
logger.warn(String.format(
"Updating the Index %s failed. The index is corrupted and marked as 
invalid.",
((AbstractIndex) index).indexName), exception);
  }
}


void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) {
try {
  index.removeIndexMapping(entry, opCode);
} catch (Exception exception) {
  index.markValid(false);
  setPRIndexAsInvalid((AbstractIndex) index);
  logger.warn(String.format(
  "Updating the Index %s failed. The index is corrupted and marked as 
invalid.",
  ((AbstractIndex) index).indexName), exception);
}
  }

  was:
When there is issue in Index update (maintenance); the index is marked as 
invalid. And warning is logged: 

[warn 2021/11/11 07:39:28.215 CST pazrslsrv004  tid=0x124ecf] Updating the Index patientMemberIdentifier failed. 
The index is corrupted and marked as invalid.
org.apache.geode.cache.query.internal.index.IMQException

Adding "key" information in the log helps diagnosing the failure and adding or 
removing the entry in question. 



> Log key info for deserilation issue while index update 
> ---
>
> Key: GEODE-9838
> URL: https://issues.apache.org/jira/browse/GEODE-9838
> Project: Geode
>  Issue Type: Improvement
>  Components: querying
>Affects Versions: 1.15.0
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> When there is issue in Index update (maintenance); the index is marked as 
> invalid. And warning is logged: 
> [warn 2021/11/11 07:39:28.215 CST pazrslsrv004  Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier 
> failed. The index is corrupted and marked as invalid.
> org.apache.geode.cache.query.internal.index.IMQException
> Adding "key" information in the log helps diagnosing the failure and adding 
> or removing the entry in question. 
> Code path IndexManager.java:
> void addIndexMapping(RegionEntry entry, IndexProtocol index) {
>   try {
> index.addIndexMapping(entry);
>   } catch (Exception exception) {
> index.markValid(false);
> setPRIndexAsInvalid((AbstractIndex) index);
> logger.warn(String.format(
> "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
> ((AbstractIndex) index).indexName), exception);
>   }
> }
> void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) {
> try {
>   index.removeIndexMapping(entry, opCode);
> } catch (Exception exception) {
>   index.markValid(false);
>   setPRIndexAsInvalid((AbstractIndex) index);
>   logger.warn(String.format(
>   "Updating the Index %s failed. The index is corrupted and marked as 
> invalid.",
>   ((AbstractIndex) index).indexName), exception);
> }
>   }



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-9838:
-
Labels: GeodeOperationAPI  (was: )

> Log key info for deserilation issue while index update 
> ---
>
> Key: GEODE-9838
> URL: https://issues.apache.org/jira/browse/GEODE-9838
> Project: Geode
>  Issue Type: Improvement
>  Components: querying
>Affects Versions: 1.15.0
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> When there is issue in Index update (maintenance); the index is marked as 
> invalid. And warning is logged: 
> [warn 2021/11/11 07:39:28.215 CST pazrslsrv004  Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier 
> failed. The index is corrupted and marked as invalid.
> org.apache.geode.cache.query.internal.index.IMQException
> Adding "key" information in the log helps diagnosing the failure and adding 
> or removing the entry in question. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread Anilkumar Gingade (Jira)
Anilkumar Gingade created GEODE-9838:


 Summary: Log key info for deserilation issue while index update 
 Key: GEODE-9838
 URL: https://issues.apache.org/jira/browse/GEODE-9838
 Project: Geode
  Issue Type: Improvement
  Components: querying
Reporter: Anilkumar Gingade


When there is issue in Index update (maintenance); the index is marked as 
invalid. And warning is logged: 

[warn 2021/11/11 07:39:28.215 CST pazrslsrv004  tid=0x124ecf] Updating the Index patientMemberIdentifier failed. 
The index is corrupted and marked as invalid.
org.apache.geode.cache.query.internal.index.IMQException

Adding "key" information in the log helps diagnosing the failure and adding or 
removing the entry in question. 




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update

2021-11-22 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-9838:
-
Affects Version/s: 1.15.0

> Log key info for deserilation issue while index update 
> ---
>
> Key: GEODE-9838
> URL: https://issues.apache.org/jira/browse/GEODE-9838
> Project: Geode
>  Issue Type: Improvement
>  Components: querying
>Affects Versions: 1.15.0
>Reporter: Anilkumar Gingade
>Priority: Major
>
> When there is issue in Index update (maintenance); the index is marked as 
> invalid. And warning is logged: 
> [warn 2021/11/11 07:39:28.215 CST pazrslsrv004  Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier 
> failed. The index is corrupted and marked as invalid.
> org.apache.geode.cache.query.internal.index.IMQException
> Adding "key" information in the log helps diagnosing the failure and adding 
> or removing the entry in question. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)