[jira] [Created] (GEODE-7216) The ExportStackTraceCommand should include a timestamp similar to jstack
Barry Oglesby created GEODE-7216: Summary: The ExportStackTraceCommand should include a timestamp similar to jstack Key: GEODE-7216 URL: https://issues.apache.org/jira/browse/GEODE-7216 Project: Geode Issue Type: Bug Components: gfsh Reporter: Barry Oglesby Currently the ExportStackTraceCommand dumps stack traces with a head for each member like: {noformat} *** Stack-trace for member server3 *** {noformat} It would be nice for support purposes if it included a timestamp like: {noformat} *** Stack-trace for member server3 at 2019-09-16 10:39:57 *** {noformat} That'll help correlate stack traces with logs and stats. Something like: {noformat} ps.append(STACK_TRACE_FOR_MEMBER).append(entry.getKey()).append(" at ") .append(new SimpleDateFormat("-MM-dd HH:mm:ss").format(new Date())).append(" ***") .append(System.lineSeparator()); {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-6586) ClientServerTransactionFailoverDistributedTest txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted failed
[ https://issues.apache.org/jira/browse/GEODE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927135#comment-16927135 ] Barry Oglesby commented on GEODE-6586: -- This reoccurred: https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK8/builds/959 =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-results/distributedTest/1568159476/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-artifacts/1568159476/distributedtestfiles-OpenJDK8-9.10.0-build.0108.tgz > ClientServerTransactionFailoverDistributedTest > txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted > failed > - > > Key: GEODE-6586 > URL: https://issues.apache.org/jira/browse/GEODE-6586 > Project: Geode > Issue Type: Bug > Components: transactions >Reporter: xiaojian zhou >Assignee: Eric Shu >Priority: Major > > {noformat} > It's found in > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/559 > org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest > > > txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted > FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest$$Lambda$177/577249945.run > in VM 1 running on Host 09628b632eb3 with 4 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579) > at org.apache.geode.test.dunit.VM.invoke(VM.java:406) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted(ClientServerTransactionFailoverDistributedTest.java:437) > Caused by: > org.junit.ComparisonFailure: expected:<"TxValue-1"> but was: > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.lambda$txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted$bb17a952$7(ClientServerTransactionFailoverDistributedTest.java:439) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GEODE-7189) CI Failure: ServerLauncherTest > startWaitsForStartupTasksToComplete failed
Barry Oglesby created GEODE-7189: Summary: CI Failure: ServerLauncherTest > startWaitsForStartupTasksToComplete failed Key: GEODE-7189 URL: https://issues.apache.org/jira/browse/GEODE-7189 Project: Geode Issue Type: Bug Components: gfsh Reporter: Barry Oglesby {noformat} org.apache.geode.distributed.ServerLauncherTest > startWaitsForStartupTasksToComplete FAILED org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a lambda expression in org.apache.geode.distributed.ServerLauncherTest that uses java.util.concurrent.CompletableFuture Wanted but not invoked: completableFuture.thenRun(); -> at org.apache.geode.distributed.ServerLauncherTest.lambda$startWaitsForStartupTasksToComplete$14(ServerLauncherTest.java:428) Actually, there were zero interactions with this mock. within 300 seconds. at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145) at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:122) at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:32) at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902) at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:723) at org.apache.geode.distributed.ServerLauncherTest.startWaitsForStartupTasksToComplete(ServerLauncherTest.java:428) Caused by: Wanted but not invoked: completableFuture.thenRun(); -> at org.apache.geode.distributed.ServerLauncherTest.lambda$startWaitsForStartupTasksToComplete$14(ServerLauncherTest.java:428) Actually, there were zero interactions with this mock. at org.apache.geode.distributed.ServerLauncherTest.lambda$startWaitsForStartupTasksToComplete$14(ServerLauncherTest.java:428) {noformat} UnitTestOpenJDK11 #943: https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/UnitTestOpenJDK11/builds/943 =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-results/test/1568154432/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-artifacts/1568154432/unittestfiles-OpenJDK11-9.10.0-build.0108.tgz -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GEODE-7187) CI Failure: RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.test[from_v120, with reindex=false] hung
Barry Oglesby created GEODE-7187: Summary: CI Failure: RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.test[from_v120, with reindex=false] hung Key: GEODE-7187 URL: https://issues.apache.org/jira/browse/GEODE-7187 Project: Geode Issue Type: Bug Components: lucene Reporter: Barry Oglesby UpgradeTestOpenJDK8 #1052: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/1052 All three stack traces contain this thread closing the cache and waiting for replies: {noformat} "RMI TCP Connection(2)-172.17.0.43" #33 daemon prio=5 os_prio=0 tid=0x7f75f4001800 nid=0x867 waiting on condition [0x7f762a17f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xe120d5b0> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64) at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:718) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2314) - locked <0xe001b568> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1937) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1927) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit.closeCache(LuceneSearchWithRollingUpgradeDUnit.java:859) at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit.access$1100(LuceneSearchWithRollingUpgradeDUnit.java:67) at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit$12.run2(LuceneSearchWithRollingUpgradeDUnit.java:672) at org.apache.geode.cache30.CacheSerializableRunnable.run(CacheSerializableRunnable.java:53) {noformat} I don't see anything processing that CloseCacheMessage. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (GEODE-7016) CI failure: ServerStartupRedundancyRecoveryNotificationTest > startupReportsOnlineOnlyAfterRedundancyRestored FAILED
[ https://issues.apache.org/jira/browse/GEODE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926847#comment-16926847 ] Barry Oglesby commented on GEODE-7016: -- This reoccurred: https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/AcceptanceTestOpenJDK8/builds/940 =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0104/test-results/acceptanceTest/1568065877/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0104/test-artifacts/1568065877/acceptancetestfiles-OpenJDK8-9.10.0-build.0104.tgz > CI failure: ServerStartupRedundancyRecoveryNotificationTest > > startupReportsOnlineOnlyAfterRedundancyRestored FAILED > > > Key: GEODE-7016 > URL: https://issues.apache.org/jira/browse/GEODE-7016 > Project: Geode > Issue Type: Bug > Components: gfsh >Affects Versions: 1.10.0 >Reporter: Anilkumar Gingade >Priority: Major > > {noformat} > org.apache.geode.launchers.ServerStartupRedundancyRecoveryNotificationTest > > startupReportsOnlineOnlyAfterRedundancyRestored FAILED > org.junit.ComparisonFailure: expected:<[0]> but was:<[1]> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:125) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:125) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:112) > at > org.apache.geode.launchers.ServerStartupRedundancyRecoveryNotificationTest.startupReportsOnlineOnlyAfterRedundancyRestored(ServerStartupRedundancyRecoveryNotificationTest.java:142) > org.junit.ComparisonFailure: expected:<[0]> but was:<[1]> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:125) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:125) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:112) > at > org.apache.geode.launchers.ServerStartupRedundancyRecoveryNotificationTest.stopAllMembers(ServerStartupRedundancyRecoveryNotificationTest.java:128) > {noformat} > https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/AcceptanceTestOpenJDK8/builds/797 > Test report artifacts from this job are available at: > gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.9.0-build.0258/test-artifacts/1564078711/acceptancetestfiles-OpenJDK8-9.9.0-build.0258.tgz -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GEODE-7183) CI Failure: ClientServerFunctionExecutionDUnitTest > testServerExecution_SocketTimeOut_WithoutRegister[ExecuteFunctionById] failed with AssertionError
Barry Oglesby created GEODE-7183: Summary: CI Failure: ClientServerFunctionExecutionDUnitTest > testServerExecution_SocketTimeOut_WithoutRegister[ExecuteFunctionById] failed with AssertionError Key: GEODE-7183 URL: https://issues.apache.org/jira/browse/GEODE-7183 Project: Geode Issue Type: Bug Components: functions Reporter: Barry Oglesby {noformat} org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest > testServerExecution_SocketTimeOut_WithoutRegister[ExecuteFunctionById] FAILED org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest$$Lambda$68/1900027546.run in VM 3 running on Host 6c6dc0c2627c with 4 VMs at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579) at org.apache.geode.test.dunit.VM.invoke(VM.java:406) at org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest.testServerExecution_SocketTimeOut_WithoutRegister(ClientServerFunctionExecutionDUnitTest.java:339) Caused by: java.lang.AssertionError: Test failed after the execute operation at org.junit.Assert.fail(Assert.java:88) at org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest.allServerExecution(ClientServerFunctionExecutionDUnitTest.java:891) at org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest.lambda$testServerExecution_SocketTimeOut_WithoutRegister$bb17a952$2(ClientServerFunctionExecutionDUnitTest.java:339) {noformat} The test logs this exception right before the failure: {noformat} [vm3] [info 2019/09/09 18:00:10.793 GMT RMI TCP Connection(26)-172.17.0.19 tid=0xb1] Exception : [vm3] org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected SocketException connection=Pooled Connection to 6c6dc0c2627c:25980: Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts [vm3] at org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659) [vm3] at org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501) [vm3] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:331) [vm3] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:300) [vm3] at org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:814) [vm3] at org.apache.geode.cache.client.internal.SingleHopOperationCallable.call(SingleHopOperationCallable.java:52) [vm3] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [vm3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [vm3] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [vm3] at java.lang.Thread.run(Thread.java:748) [vm3] Caused by: java.net.SocketException: Socket is closed [vm3] at java.net.Socket.setSoTimeout(Socket.java:1137) [vm3] at org.apache.geode.cache.client.internal.AbstractOpWithTimeout.attempt(AbstractOpWithTimeout.java:48) [vm3] at org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:263) [vm3] at org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:353) [vm3] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:750) [vm3] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:329) [vm3] ... 7 more {noformat} https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK8/builds/952 =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0101/test-results/distributedTest/1568054303/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0101/test-artifacts/1568054303/distributedtestfiles-OpenJDK8-9.10.0-build.0101.tgz -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GEODE-7181) CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > EventProcessingMixedSiteOneCurrentSiteTwo[from_v140] failed with BindException
Barry Oglesby created GEODE-7181: Summary: CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > EventProcessingMixedSiteOneCurrentSiteTwo[from_v140] failed with BindException Key: GEODE-7181 URL: https://issues.apache.org/jira/browse/GEODE-7181 Project: Geode Issue Type: Bug Components: wan Reporter: Barry Oglesby {noformat} org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > EventProcessingMixedSiteOneCurrentSiteTwo[from_v140] FAILED org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo$$Lambda$49/702999041.run in VM 4 running on Host 25462cccf035 with 7 VMs at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579) at org.apache.geode.test.dunit.VM.invoke(VM.java:406) at org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.EventProcessingMixedSiteOneCurrentSiteTwo(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:76) Caused by: java.net.BindException: Failed to create server socket on 25462cccf035/172.17.0.27[24363] at org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:722) at org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:680) at org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:647) at org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:226) at org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:216) at org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:211) at org.apache.geode.distributed.internal.InternalLocator.startTcpServer(InternalLocator.java:560) at org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:617) at org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:373) at org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:328) at org.apache.geode.distributed.Locator.startLocator(Locator.java:252) at org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:139) at org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:105) at org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:97) at org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.lambda$EventProcessingMixedSiteOneCurrentSiteTwo$67afc7f8$1(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:78) Caused by: java.net.BindException: Address already in use (Bind failed) at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) at java.net.ServerSocket.bind(ServerSocket.java:375) at org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:719) ... 14 more {noformat} UpgradeTestOpenJDK8 #1054: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/1054 =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0108/test-results/upgradeTest/1568110268/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0108/test-artifacts/1568110268/upgradetestfiles-OpenJDK8-1.11.0-SNAPSHOT.0108.tgz This same exception been reported two other times in GEODE-6454, but the original exception in that JIRA was not this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled
[ https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-7066: - Fix Version/s: (was: 1.10.0) 1.11.0 > Events can be lost in a gateway batch containing duplicate non-conflatable > events with conflation enabled > - > > Key: GEODE-7066 > URL: https://issues.apache.org/jira/browse/GEODE-7066 > Project: Geode > Issue Type: Bug > Components: wan >Affects Versions: 1.9.0 >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Fix For: 1.11.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If a batch contains duplicate CREATE and DESTROY events on key 1736 like > below and conflation is enabled, the earlier events will be overwritten by > the later events. > {noformat} > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > The batch will look like this after conflation: > {noformat} > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > All the events from threadID=0x30004|5 are gone. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled
[ https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-7066. -- Fix Version/s: 1.10.0 Resolution: Fixed > Events can be lost in a gateway batch containing duplicate non-conflatable > events with conflation enabled > - > > Key: GEODE-7066 > URL: https://issues.apache.org/jira/browse/GEODE-7066 > Project: Geode > Issue Type: Bug > Components: wan >Affects Versions: 1.9.0 >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Fix For: 1.10.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If a batch contains duplicate CREATE and DESTROY events on key 1736 like > below and conflation is enabled, the earlier events will be overwritten by > the later events. > {noformat} > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > The batch will look like this after conflation: > {noformat} > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > All the events from threadID=0x30004|5 are gone. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled
[ https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-7066: - Affects Version/s: 1.9.0 > Events can be lost in a gateway batch containing duplicate non-conflatable > events with conflation enabled > - > > Key: GEODE-7066 > URL: https://issues.apache.org/jira/browse/GEODE-7066 > Project: Geode > Issue Type: Bug > Components: wan >Affects Versions: 1.9.0 >Reporter: Barry Oglesby >Priority: Major > > If a batch contains duplicate CREATE and DESTROY events on key 1736 like > below and conflation is enabled, the earlier events will be overwritten by > the later events. > {noformat} > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > The batch will look like this after conflation: > {noformat} > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > All the events from threadID=0x30004|5 are gone. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled
[ https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-7066: Assignee: Barry Oglesby > Events can be lost in a gateway batch containing duplicate non-conflatable > events with conflation enabled > - > > Key: GEODE-7066 > URL: https://issues.apache.org/jira/browse/GEODE-7066 > Project: Geode > Issue Type: Bug > Components: wan >Affects Versions: 1.9.0 >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > If a batch contains duplicate CREATE and DESTROY events on key 1736 like > below and conflation is enabled, the earlier events will be overwritten by > the later events. > {noformat} > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > The batch will look like this after conflation: > {noformat} > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], > > GatewaySenderEventImpl[id=EventID[id=31 > bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] > {noformat} > All the events from threadID=0x30004|5 are gone. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled
Barry Oglesby created GEODE-7066: Summary: Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled Key: GEODE-7066 URL: https://issues.apache.org/jira/browse/GEODE-7066 Project: Geode Issue Type: Bug Components: wan Reporter: Barry Oglesby If a batch contains duplicate CREATE and DESTROY events on key 1736 like below and conflation is enabled, the earlier events will be overwritten by the later events. {noformat} GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736], GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736], GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] {noformat} The batch will look like this after conflation: {noformat} GatewaySenderEventImpl[id=EventID[id=31 bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079], GatewaySenderEventImpl[id=EventID[id=31 bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079], GatewaySenderEventImpl[id=EventID[id=31 bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736], GatewaySenderEventImpl[id=EventID[id=31 bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079], GatewaySenderEventImpl[id=EventID[id=31 bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736] {noformat} All the events from threadID=0x30004|5 are gone. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-6933) Gateway sender alert-threshold not working
[ https://issues.apache.org/jira/browse/GEODE-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883376#comment-16883376 ] Barry Oglesby commented on GEODE-6933: -- The AbstractGatewaySenderEventProcessor handleSuccessfulBatchDispatch is the only place the warning is logged. That method is only called in the case of a GatewaySenderEventCallbackDispatcher (which is the AsyncEventQueue dispatcher). Its not called in AbstractGatewaySenderEventProcessor handleSuccessBatchAck which is the GatewaySenderEventRemoteDispatcher case. The GatewaySenderEventRemoteDispatcher is used by the GatewaySender or wan case. Further, the alert-threshold is only supported: - in the gateway-sender in the xsd - in the create gateway-sender gfsh So, its not even configurable on an AsyncEventQueue. If I hack AbstractGatewaySender and set alertThreshold > 0, I can see a warning in the AsyncEventQueue case: {noformat} [warn 2019/07/11 15:20:38.531 PDT tid=0x36] CREATE event for region=/data key=TradeKey[id=0] value=Trade[id=0; ...] was in the queue for 10124 milliseconds {noformat} AbstractGatewaySenderEventProcessor handleSuccessBatchAck should be changed to log the warning just like AbstractGatewaySenderEventProcessor handleSuccessBatchAck does. And either AsyncEventQueue should be modified to support alert-threshold or the warning code should be removed from AbstractGatewaySenderEventProcessor handleSuccessfulBatchDispatch. > Gateway sender alert-threshold not working > -- > > Key: GEODE-6933 > URL: https://issues.apache.org/jira/browse/GEODE-6933 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Mario Ivanac >Assignee: Mario Ivanac >Priority: Major > > When alert-threshold function is activated in gateway senders (set > alert-threshold to non zero value), > event is never raised, for entries which are in queue longer than > alert-threshold value. > > Printout from logs: > Monitor = GatewaySenderMXBeanMonitor descriptor = > eventsExceedingAlertThreshold And value = 0 > > It seams that reporting of events which exceed alert threshold (class > AbstractGatewaySenderEventProcessor), is working only if dispatcher is > instance of GatewaySenderEventCallbackDispatcher. > With deeper analysis, I got to conclusion that for GatewaySender - dispatcher > is instance of {color:#629755}GatewaySenderEventRemoteDispatcher.{color} > {color:#ff}So this function is only working for > {color}{color:#808080}AsyncEventQueue, for which dispatcher is instance of > GatewaySenderEventCallbackDispatcher.{color} > > {color:#808080}The other problem is that _getEventsExceedingAlertThreshold() > method of_ _GatewaySenderMBean is always returning hardcoded value > (0)._{color} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-6953) CI failure: RedundancyLevelPart1DUnitTest testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with ComparisonFailure
[ https://issues.apache.org/jira/browse/GEODE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881644#comment-16881644 ] Barry Oglesby commented on GEODE-6953: -- Both the primary and secondary connections are established. Secondary: {noformat} [vm1] [info 2019/07/09 23:28:28.287 GMT tid=0x106] :Cache server: Initializing secondary server-to-client communication socket: Socket[addr=/172.17.0.2,port=59586,localport=29482] [info 2019/07/09 23:28:28.303 GMT :41002 port 29482> tid=0x84] Cache Client Updater Thread on e01ec2d18901(155):41002 port 29482 (e01ec2d18901:29482) : ready to process messages. {noformat} Primary: {noformat} [vm0] [info 2019/07/09 23:28:28.305 GMT tid=0x108] :Cache server: Initializing primary server-to-client communication socket: Socket[addr=/172.17.0.2,port=47704,localport=22651] [info 2019/07/09 23:28:28.327 GMT :41001 port 22651> tid=0x85] Cache Client Updater Thread on e01ec2d18901(151):41001 port 22651 (e01ec2d18901:22651) : ready to process messages. {noformat} Immediately after that, there is an AsynchronousCloseException in vm0 (the primary): {noformat} [vm0] [info 2019/07/09 23:28:28.338 GMT tid=0x105] Connection: shared=true ordered=true handshake failed to connect to peer 172.17.0.2(155):41002 because: java.nio.channels.AsynchronousCloseException {noformat} Then, a timeout occurs. I'm not sure if this is the registerInterest call: {noformat} [warn 2019/07/09 23:28:28.585 GMT tid=0x1b] Pool unexpected socket timed out on client connection=Pooled Connection to e01ec2d18901:22651: Connection[e01ec2d18901:22651]@789257760) [warn 2019/07/09 23:28:28.621 GMT tid=0x1b] Usage of registerInterest(List) has been deprecated. Please use registerInterestForKeys(Iterable) {noformat} Then, the secondary connection crashes. {noformat} [info 2019/07/09 23:28:28.873 GMT tid=0x1b] Redundant subscription endpoint e01ec2d18901:29482 crashed. Scheduling recovery. [info 2019/07/09 23:28:28.876 GMT tid=0x86] SubscriptionManager redundancy satisfier - redundant endpoint has been lost. Attempting to recover. [warn 2019/07/09 23:28:28.876 GMT tid=0x1b] Pool unexpected socket timed out on client connection=SubscriptionConnectionImpl[e01ec2d18901:29482:closed]) [info 2019/07/09 23:28:28.876 GMT :41002 port 29482> tid=0x84] Cache client updater for Queue on endpoint e01ec2d18901:29482 exiting. Scheduling recovery. {noformat} And the cache is closed: {noformat} [info 2019/07/09 23:28:29.000 GMT tid=0x1b] GemFireCache[id = 1192776621; isClosing = true; isShutDownAll = false; created = Tue Jul 09 23:28:28 GMT 2019; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing. {noformat} > CI failure: RedundancyLevelPart1DUnitTest > testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with > ComparisonFailure > -- > > Key: GEODE-6953 > URL: https://issues.apache.org/jira/browse/GEODE-6953 > Project: Geode > Issue Type: Bug > Components: client/server >Reporter: Barry Oglesby >Priority: Major > > RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut > failed in DistributedTestOpenJDK11 build 762: > https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK11/builds/762 > {noformat} > org.junit.ComparisonFailure: expected:<[1]> but was:<[0]> > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut(RedundancyLevelPart1DUnitTest.java:304) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at >
[jira] [Created] (GEODE-6953) CI failure: RedundancyLevelPart1DUnitTest testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with ComparisonFailure
Barry Oglesby created GEODE-6953: Summary: CI failure: RedundancyLevelPart1DUnitTest testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with ComparisonFailure Key: GEODE-6953 URL: https://issues.apache.org/jira/browse/GEODE-6953 Project: Geode Issue Type: Bug Components: client/server Reporter: Barry Oglesby RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed in DistributedTestOpenJDK11 build 762: https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK11/builds/762 {noformat} org.junit.ComparisonFailure: expected:<[1]> but was:<[0]> at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut(RedundancyLevelPart1DUnitTest.java:304) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[jira] [Resolved] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFire
[ https://issues.apache.org/jira/browse/GEODE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6929. -- Resolution: Fixed Fix Version/s: 1.10.0 > In the case of a ConcurrentCacheModificationException that occurs while > processing a RemotePutMessage, the reply is attempted to be sent twice > causing an InternalGemFireError > -- > > Key: GEODE-6929 > URL: https://issues.apache.org/jira/browse/GEODE-6929 > Project: Geode > Issue Type: Bug > Components: messaging >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Fix For: 1.10.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The RemotePutMessage operateOnRegion method attempts to sendReply twice if > result is false: > {noformat} > try { > result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, > this.expectedOldValue, > this.requireOldValue, this.lastModified, true); > if (!this.result) { // make sure the region hasn't gone away > r.checkReadiness(); > if (!this.ifNew && !this.ifOld) { > // no reason to be throwing an exception, so let's retry > RemoteOperationException ex = new RemoteOperationException( > "unable to perform put, but operation should not fail"); > 1 -> sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, > startTime); > } > } > ... > if (sendReply) { > 2-> sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event); > } > {noformat} > This causes this fatal InternalGemFireError: > {noformat} > [fatal 2019/06/28 15:33:01.005 PDT 192.168.1.2(gateway-ny-proxy-1:77395):41003 unshared ordered uid=12 dom > #1 port=58836> tid=0x4c] Uncaught exception processing > tx.RemotePutMessage(regionPath=/TradeDateCalendar; > sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; > processorId=0; key=87; value=(5 bytes); > callback=GatewaySenderEventCallbackArgument > [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}]; > op=UPDATE; > bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1; > eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; > ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; > hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false) > org.apache.geode.InternalGemFireError: Trying to reply twice to a message > at org.apache.geode.internal.Assert.throwError(Assert.java:89) > at org.apache.geode.internal.Assert.assertTrue(Assert.java:107) > at > org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55) > at > org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791) > at > org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675) > at > org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) > at > org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407) > at > org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701) > at > org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702) > at > org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427) > at > org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164) > at >
[jira] [Created] (GEODE-6931) A failed RemotePutMessage can cause a PersistentReplicatesOfflineException to be thrown when no persistent members are offline
Barry Oglesby created GEODE-6931: Summary: A failed RemotePutMessage can cause a PersistentReplicatesOfflineException to be thrown when no persistent members are offline Key: GEODE-6931 URL: https://issues.apache.org/jira/browse/GEODE-6931 Project: Geode Issue Type: Bug Components: messaging Reporter: Barry Oglesby One of the places that RemotePutMessage is sent is DistributedRegion virtualPut. Its sent from this method in this case: - 2 wan sites - the member in the receiving site that processes the batch defines the region as replicate proxy - other receiving site members define the region as replicate persistent DistributedRegion virtualPut is invoked by the GatewayReceiverCommand here: {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:341) at org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162) at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5549) at org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5200) at org.apache.geode.internal.cache.tier.sockets.command.GatewayReceiverCommand.cmdExecute(GatewayReceiverCommand.java:429) {noformat} In this case, requiresOneHopForMissingEntry called by virtualPut returns true since a proxy region with other persistent replicates can't generate a version tag. This causes RemotePutMessage.distribute to be called. If didDistribute returns false from RemotePutMessage.distribute (meaning the distribution failed), a PersistentReplicatesOfflineException is thrown regardless of the actual exception on the remote member: {noformat} if (!generateVersionTag && !didDistribute) { throw new PersistentReplicatesOfflineException(); } {noformat} One of the ways that didDistribute can be false is if both the remote wan site and local wan site are updating the same key at the same time. In that case a ConcurrentCacheModificationException can occur in the replicate persistent member (the one processing the RemotePutMessage). This exception is not logged anywhere, and RemotePutMessage operateOnRegion doesn't know anything about it. RemotePutMessage operateOnRegion running in the replicate persistent member calls: {noformat} result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, this.expectedOldValue, this.requireOldValue, this.lastModified, true); {noformat} If putEntry returns false, it throws a RemoteOperationException which is sent back to the caller and causes didDistribute to be false. The result can be false in the RemotePutMessage operateOnRegion method because of a ConcurrentCacheModificationException: {noformat} org.apache.geode.internal.cache.versions.ConcurrentCacheModificationException: conflicting WAN event detected at org.apache.geode.internal.cache.entries.AbstractRegionEntry.processGatewayTag(AbstractRegionEntry.java:1924) at org.apache.geode.internal.cache.entries.AbstractRegionEntry.processVersionTag(AbstractRegionEntry.java:1443) at org.apache.geode.internal.cache.entries.AbstractOplogDiskRegionEntry.processVersionTag(AbstractOplogDiskRegionEntry.java:165) at org.apache.geode.internal.cache.entries.VersionedThinDiskLRURegionEntryHeapStringKey1.processVersionTag(VersionedThinDiskLRURegionEntryHeapStringKey1.java:378) at org.apache.geode.internal.cache.AbstractRegionMap.processVersionTag(AbstractRegionMap.java:527) at org.apache.geode.internal.cache.map.RegionMapPut.updateEntry(RegionMapPut.java:484) at org.apache.geode.internal.cache.map.RegionMapPut.createOrUpdateEntry(RegionMapPut.java:256) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutAndDeliverEvent(AbstractRegionMapPut.java:300) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWithIndexUpdatingInProgress(AbstractRegionMapPut.java:308) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:296) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198) at
[jira] [Updated] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFireE
[ https://issues.apache.org/jira/browse/GEODE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6929: - Description: The RemotePutMessage operateOnRegion method attempts to sendReply twice if result is false: {noformat} try { result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, this.expectedOldValue, this.requireOldValue, this.lastModified, true); if (!this.result) { // make sure the region hasn't gone away r.checkReadiness(); if (!this.ifNew && !this.ifOld) { // no reason to be throwing an exception, so let's retry RemoteOperationException ex = new RemoteOperationException( "unable to perform put, but operation should not fail"); 1 -> sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, startTime); } } ... if (sendReply) { 2-> sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event); } {noformat} This causes this fatal InternalGemFireError: {noformat} [fatal 2019/06/28 15:33:01.005 PDT :41003 unshared ordered uid=12 dom #1 port=58836> tid=0x4c] Uncaught exception processing tx.RemotePutMessage(regionPath=/TradeDateCalendar; sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; processorId=0; key=87; value=(5 bytes); callback=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}]; op=UPDATE; bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1; eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false) org.apache.geode.InternalGemFireError: Trying to reply twice to a message at org.apache.geode.internal.Assert.throwError(Assert.java:89) at org.apache.geode.internal.Assert.assertTrue(Assert.java:107) at org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55) at org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791) at org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675) at org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) at org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425) at org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891) at org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571) at org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110) at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407) at org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701) at org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702) at org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427) at org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164) at org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2959) at org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1743) at org.apache.geode.internal.tcp.Connection.run(Connection.java:1579) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} was: The RemotePutMessage operateOnRegion method attempts to sendReply twice if result is false: ``` try { result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, this.expectedOldValue, this.requireOldValue, this.lastModified, true); if (!this.result) { // make sure the region hasn't gone away r.checkReadiness(); if (!this.ifNew && !this.ifOld) { // no reason to be throwing an exception, so let's retry RemoteOperationException ex = new
[jira] [Assigned] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFire
[ https://issues.apache.org/jira/browse/GEODE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6929: Assignee: Barry Oglesby > In the case of a ConcurrentCacheModificationException that occurs while > processing a RemotePutMessage, the reply is attempted to be sent twice > causing an InternalGemFireError > -- > > Key: GEODE-6929 > URL: https://issues.apache.org/jira/browse/GEODE-6929 > Project: Geode > Issue Type: Bug > Components: messaging >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > The RemotePutMessage operateOnRegion method attempts to sendReply twice if > result is false: > ``` > try { > result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, > this.expectedOldValue, > this.requireOldValue, this.lastModified, true); > if (!this.result) { // make sure the region hasn't gone away > r.checkReadiness(); > if (!this.ifNew && !this.ifOld) { > // no reason to be throwing an exception, so let's retry > RemoteOperationException ex = new RemoteOperationException( > "unable to perform put, but operation should not fail"); > 1 -> sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, > startTime); > } > } > ... > if (sendReply) { > 2-> sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event); > } > {noformat} > This causes this fatal InternalGemFireError: > {noformat} > [fatal 2019/06/28 15:33:01.005 PDT 192.168.1.2(gateway-ny-proxy-1:77395):41003 unshared ordered uid=12 dom > #1 port=58836> tid=0x4c] Uncaught exception processing > tx.RemotePutMessage(regionPath=/TradeDateCalendar; > sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; > processorId=0; key=87; value=(5 bytes); > callback=GatewaySenderEventCallbackArgument > [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}]; > op=UPDATE; > bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1; > eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; > ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; > hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false) > org.apache.geode.InternalGemFireError: Trying to reply twice to a message > at org.apache.geode.internal.Assert.throwError(Assert.java:89) > at org.apache.geode.internal.Assert.assertTrue(Assert.java:107) > at > org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55) > at > org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791) > at > org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675) > at > org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) > at > org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407) > at > org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701) > at > org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702) > at > org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427) > at > org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164) > at > org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2959) > at >
[jira] [Created] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFireE
Barry Oglesby created GEODE-6929: Summary: In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFireError Key: GEODE-6929 URL: https://issues.apache.org/jira/browse/GEODE-6929 Project: Geode Issue Type: Bug Components: messaging Reporter: Barry Oglesby The RemotePutMessage operateOnRegion method attempts to sendReply twice if result is false: ``` try { result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, this.expectedOldValue, this.requireOldValue, this.lastModified, true); if (!this.result) { // make sure the region hasn't gone away r.checkReadiness(); if (!this.ifNew && !this.ifOld) { // no reason to be throwing an exception, so let's retry RemoteOperationException ex = new RemoteOperationException( "unable to perform put, but operation should not fail"); 1 -> sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, startTime); } } ... if (sendReply) { 2-> sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event); } {noformat} This causes this fatal InternalGemFireError: {noformat} [fatal 2019/06/28 15:33:01.005 PDT :41003 unshared ordered uid=12 dom #1 port=58836> tid=0x4c] Uncaught exception processing tx.RemotePutMessage(regionPath=/TradeDateCalendar; sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; processorId=0; key=87; value=(5 bytes); callback=GatewaySenderEventCallbackArgument [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}]; op=UPDATE; bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1; eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false) org.apache.geode.InternalGemFireError: Trying to reply twice to a message at org.apache.geode.internal.Assert.throwError(Assert.java:89) at org.apache.geode.internal.Assert.assertTrue(Assert.java:107) at org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55) at org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791) at org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675) at org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) at org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425) at org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891) at org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571) at org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110) at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407) at org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701) at org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702) at org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427) at org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164) at org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2959) at org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1743) at org.apache.geode.internal.tcp.Connection.run(Connection.java:1579) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet
[ https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871818#comment-16871818 ] Barry Oglesby edited comment on GEODE-3718 at 6/24/19 9:52 PM: --- The issue is that when a member is stopped, the shutdown hook attempts to stop the InternalResourceManager's scheduledExecutor. If that executor has any pending tasks (including ones in the future), the shutdown hook blocks waiting for them to fire. If I run this test: 1. Start 3 servers defining a partitioned region with recovery-delay > 0 2. Load some data into the partitioned region 3. Kill one server using kill -9 4. The remaining servers schedule the recovery task 5. Stop the remaining servers normally (the JVMs do not stop) The shutdown hook thread is waiting here for the InternalResourceManager's scheduledExecutor to terminate: {noformat} "Distributed system shutdown hook" #12 prio=5 os_prio=31 tid=0x7ff0f8c33000 nid=0x12307 waiting on condition [0x7bc11000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000766c00038> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465) at org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:343) at org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:156) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2079) - locked <0x00075b58> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1516) - locked <0x00075b58> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$6(InternalDistributedSystem.java:2181) at org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$8/111900554.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) {noformat} InternalResourceManager.stopExecutor waits up to 120 seconds to stop: {noformat} void stopExecutor(ExecutorService executor) { if (executor == null) { return; } executor.shutdown(); final int secToWait = Integer .getInteger(DistributionConfig.GEMFIRE_PREFIX + "prrecovery-close-timeout", 120).intValue(); try { executor.awaitTermination(secToWait, TimeUnit.SECONDS); } catch (InterruptedException x) { Thread.currentThread().interrupt(); logger.debug("Failed in interrupting the Resource Manager Thread due to interrupt"); } if (!executor.isTerminated()) { logger.warn("Failed to stop resource manager threads in {} seconds", secToWait); } } {noformat} I added some logging that shows the sequence of events. The logging shows the redundancy recovery task being scheduled when the first server is killed (steps 3 and 4 above): {noformat} [warn 2019/06/24 14:09:21.014 PDT tid=0x15] PRHARedundancyProvider.scheduleRedundancyRecovery about to schedule task with delay=3 {noformat} Then, the shutdown hook is invoked when the server is stopped which causes the InternalResourceManager to wait. The 1 task below is the redundancy recovery task: {noformat} [info 2019/06/24 14:09:25.161 PDT tid=0xc] VM is exiting - shutting down distributed system [info 2019/06/24 14:09:25.173 PDT tid=0xc] GemFireCache[id = 1850680894; isClosing = true; isShutDownAll = false; created = Mon Jun 24 14:08:40 PDT 2019; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing. [warn 2019/06/24 14:09:25.174 PDT tid=0xc] InternalResourceManager.stopExecutor waiting up to 120 seconds to terminate ScheduledThreadPoolExecutor containing 1 task {noformat} Then, the redundancy recovery task attempts to execute, but a CacheClosedException is thrown, so it doesn't do anything: {noformat} [warn 2019/06/24 14:09:51.016 PDT tid=0xb2] PRHARedundancyProvider.run2 about to start recovery [warn 2019/06/24 14:09:51.016 PDT tid=0xb2] PRHARedundancyProvider.run2 caught: org.apache.geode.cache.CacheClosedException: The cache is closed. at org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:1482) at org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83) at
[jira] [Commented] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet
[ https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871818#comment-16871818 ] Barry Oglesby commented on GEODE-3718: -- This code change is after the issue has already happened. We can do this, but it doesn't address the issue. The issue is that when a member is stopped, the shutdown hook attempts to stop the InternalResourceManager's scheduledExecutor. If that executor has any pending tasks (including ones in the future), the shutdown hook blocks waiting for them to fire. If I run this test: 1. Start 3 servers defining a partitioned region with recovery-delay > 0 2. Load some data into the partitioned region 3. Kill one server using kill -9 4. The remaining servers schedule the recovery task 5. Stop the remaining servers normally (the JVMs do not stop) The shutdown hook thread is waiting here for the InternalResourceManager's scheduledExecutor to terminate: {noformat} "Distributed system shutdown hook" #12 prio=5 os_prio=31 tid=0x7ff0f8c33000 nid=0x12307 waiting on condition [0x7bc11000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000766c00038> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465) at org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:343) at org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:156) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2079) - locked <0x00075b58> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1516) - locked <0x00075b58> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$6(InternalDistributedSystem.java:2181) at org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$8/111900554.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) {noformat} InternalResourceManager.stopExecutor waits up to 120 seconds to stop: {noformat} void stopExecutor(ExecutorService executor) { if (executor == null) { return; } executor.shutdown(); final int secToWait = Integer .getInteger(DistributionConfig.GEMFIRE_PREFIX + "prrecovery-close-timeout", 120).intValue(); try { executor.awaitTermination(secToWait, TimeUnit.SECONDS); } catch (InterruptedException x) { Thread.currentThread().interrupt(); logger.debug("Failed in interrupting the Resource Manager Thread due to interrupt"); } if (!executor.isTerminated()) { logger.warn("Failed to stop resource manager threads in {} seconds", secToWait); } } {noformat} I added some logging that shows the sequence of events. The logging shows the redundancy recovery task being scheduled when the first server is killed (steps 3 and 4 above): {noformat} [warn 2019/06/24 14:09:21.014 PDT tid=0x15] PRHARedundancyProvider.scheduleRedundancyRecovery about to schedule task with delay=3 {noformat} Then, the shutdown hook is invoked when the server is stopped which causes the InternalResourceManager to wait. The 1 task below is the redundancy recovery task: {noformat} [info 2019/06/24 14:09:25.161 PDT tid=0xc] VM is exiting - shutting down distributed system [info 2019/06/24 14:09:25.173 PDT tid=0xc] GemFireCache[id = 1850680894; isClosing = true; isShutDownAll = false; created = Mon Jun 24 14:08:40 PDT 2019; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing. [warn 2019/06/24 14:09:25.174 PDT tid=0xc] InternalResourceManager.stopExecutor waiting up to 120 seconds to terminate ScheduledThreadPoolExecutor containing 1 task {noformat} Then, the redundancy recovery task attempts to execute, but a CacheClosedException is thrown, so it doesn't do anything: {noformat} [warn 2019/06/24 14:09:51.016 PDT tid=0xb2] PRHARedundancyProvider.run2 about to start recovery [warn 2019/06/24 14:09:51.016 PDT tid=0xb2] PRHARedundancyProvider.run2 caught: org.apache.geode.cache.CacheClosedException: The cache is closed. at org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:1482) at
[jira] [Issue Comment Deleted] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet
[ https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-3718: - Comment: was deleted (was: Lynn and I have a test that reproduces this issue. 3 servers configured like: server 1 - replicate region server 2 - replicate persistent region server 3 - replicate persistent region Server 1 is configured like: {noformat} {noformat} Servers 2 and 3 are configured like: {noformat} {noformat} This matches the customer's proxy and data groups members. Server1 is in the proxy group and servers 2 and 3 are in the data group. Note: You must start the persistent servers first. kill -9 one of the servers with a replicate persistent region. When synchronization occurs (after maximumTimeBetweenPings - 6ms), a message like this will be logged in each member: [info 2019/06/24 09:46:13.133 PDT tid=0x2b] Region UTLatest is requesting synchronization with 192.168.1.2(server3:51729):41002 for 192.168.1.2(server2:51722):41001 The member with the replicate region will also throw the ToDataException. A couple work-arounds are: - Instead of using a replicate region in server1, use a replicate proxy region - Use replicate persistent regions in all members) > The InternalResourceManager fails to shutdown if a redundancy recovery task > is scheduled but hasn't fired yet > - > > Key: GEODE-3718 > URL: https://issues.apache.org/jira/browse/GEODE-3718 > Project: Geode > Issue Type: Bug > Components: core >Reporter: Barry Oglesby >Assignee: Mario Ivanac >Priority: Major > Labels: needs-review, pull-request-available, recovery > Time Spent: 0.5h > Remaining Estimate: 0h > > This happens with recovery-delay or startup-recovery-delay > 0. > The thread gets stuck here: > {noformat} > "Thread-20" #133 prio=10 os_prio=31 tid=0x7fa85b886000 nid=0x890b waiting > on condition [0x70001269e000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007bc408900> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > at > java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465) > at > org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:375) > at > org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:187) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2161) > - locked <0x0007bc0bc520> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1367) > - locked <0x0007bc0bc520> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1017) > at > org.apache.geode.management.internal.beans.MemberMBeanBridge$1.run(MemberMBeanBridge.java:986) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The InternalResourceManager is waiting for the termination of its > scheduledExecutor. > The PRHARedundancyProvider initializes its recoveryExecutor using the > InternalResourceManager's scheduledExecutor: > {noformat} > recoveryExecutor = new OneTaskOnlyExecutor(resourceManager.getExecutor(), > new OneTaskOnlyExecutor.ConflatedTaskListener() { > public void taskDropped() { > InternalResourceManager.getResourceObserver().recoveryConflated(region); > } > }); > {noformat} > The scheduleRedundancyRecovery method schedules a RecoveryRunnable if > necessary. > If that task hasn't fired yet, the InternalResourceManager doesn't close, and > the JVM stays up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6901) If a region is replicate and replicate persistent in different members and a replicate persistent member crashes, the replicate members throw a ToDataException attempti
[ https://issues.apache.org/jira/browse/GEODE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871517#comment-16871517 ] Barry Oglesby commented on GEODE-6901: -- Lynn and I have a test that reproduces this issue. 3 servers configured like: server 1 - replicate region server 2 - replicate persistent region server 3 - replicate persistent region Server 1 is configured like: {noformat} {noformat} Servers 2 and 3 are configured like: {noformat} {noformat} This matches the customer's proxy and data groups members. Server1 is in the proxy group and servers 2 and 3 are in the data group. Note: You must start the persistent servers first. kill -9 one of the servers with a replicate persistent region. When synchronization occurs (after maximumTimeBetweenPings - 6ms), a message like this will be logged in each member: [info 2019/06/24 09:46:13.133 PDT tid=0x2b] Region UTLatest is requesting synchronization with 192.168.1.2(server3:51729):41002 for 192.168.1.2(server2:51722):41001 The member with the replicate region will also throw the ToDataException. A couple work-arounds are: - Instead of using a replicate region in server1, use a replicate proxy region - Use replicate persistent regions in all members > If a region is replicate and replicate persistent in different members and a > replicate persistent member crashes, the replicate members throw a > ToDataException attempting to synchronize the region > > > Key: GEODE-6901 > URL: https://issues.apache.org/jira/browse/GEODE-6901 > Project: Geode > Issue Type: Bug > Components: persistence, regions >Reporter: Barry Oglesby >Priority: Major > > If a region is replicate and replicate persistent in different members and a > replicate persistent member crashes, the replicate members throw a > ToDataException attempting to synchronize the region > In this case, an exception like this is thrown in the replicate member: > {noformat} > [warn 2019/06/21 17:06:33.516 PDT tid=0x2b] Timer task > encountered > exception > org.apache.geode.ToDataException: class > org.apache.geode.internal.cache.versions.VMRegionVersionVector > at > org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2331) > at > org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492) > at > org.apache.geode.internal.InternalDataSerializer.basicWriteObject(InternalDataSerializer.java:2067) > at org.apache.geode.DataSerializer.writeObject(DataSerializer.java:2943) > at > org.apache.geode.internal.cache.InitialImageOperation$RequestImageMessage.toData(InitialImageOperation.java:2135) > at > org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2300) > at > org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492) > at > org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:242) > at > org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:385) > at > org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:241) > at > org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:596) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1711) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.send(GMSMembershipManager.java:1892) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2852) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:2779) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1526) > at > org.apache.geode.internal.cache.InitialImageOperation.synchronizeWith(InitialImageOperation.java:649) > at > org.apache.geode.internal.cache.DistributedRegion.synchronizeWith(DistributedRegion.java:1321) > at > org.apache.geode.internal.cache.DistributedRegion.synchronizeForLostMember(DistributedRegion.java:1310) > at > org.apache.geode.internal.cache.DistributedRegion.performSynchronizeForLostMemberTask(DistributedRegion.java:1295) > at > org.apache.geode.internal.cache.DistributedRegion$1.run2(DistributedRegion.java:1285) > at >
[jira] [Commented] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet
[ https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871516#comment-16871516 ] Barry Oglesby commented on GEODE-3718: -- Lynn and I have a test that reproduces this issue. 3 servers configured like: server 1 - replicate region server 2 - replicate persistent region server 3 - replicate persistent region Server 1 is configured like: ``` ``` Servers 2 and 3 are configured like: ``` ``` This matches the customer's proxy and data groups members. Server1 is in the proxy group and servers 2 and 3 are in the data group. Note: You must start the persistent servers first. kill -9 one of the servers with a replicate persistent region. When synchronization occurs (after maximumTimeBetweenPings - 6ms), a message like this will be logged in each member: [info 2019/06/24 09:46:13.133 PDT tid=0x2b] Region UTLatest is requesting synchronization with 192.168.1.2(server3:51729):41002 for 192.168.1.2(server2:51722):41001 The member with the replicate region will also throw the ToDataException. A couple work-arounds are: - Instead of using a replicate region in server1, use a replicate proxy region - Use replicate persistent regions in all members > The InternalResourceManager fails to shutdown if a redundancy recovery task > is scheduled but hasn't fired yet > - > > Key: GEODE-3718 > URL: https://issues.apache.org/jira/browse/GEODE-3718 > Project: Geode > Issue Type: Bug > Components: core >Reporter: Barry Oglesby >Assignee: Mario Ivanac >Priority: Major > Labels: needs-review, pull-request-available, recovery > Time Spent: 0.5h > Remaining Estimate: 0h > > This happens with recovery-delay or startup-recovery-delay > 0. > The thread gets stuck here: > {noformat} > "Thread-20" #133 prio=10 os_prio=31 tid=0x7fa85b886000 nid=0x890b waiting > on condition [0x70001269e000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007bc408900> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > at > java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465) > at > org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:375) > at > org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:187) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2161) > - locked <0x0007bc0bc520> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1367) > - locked <0x0007bc0bc520> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1017) > at > org.apache.geode.management.internal.beans.MemberMBeanBridge$1.run(MemberMBeanBridge.java:986) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The InternalResourceManager is waiting for the termination of its > scheduledExecutor. > The PRHARedundancyProvider initializes its recoveryExecutor using the > InternalResourceManager's scheduledExecutor: > {noformat} > recoveryExecutor = new OneTaskOnlyExecutor(resourceManager.getExecutor(), > new OneTaskOnlyExecutor.ConflatedTaskListener() { > public void taskDropped() { > InternalResourceManager.getResourceObserver().recoveryConflated(region); > } > }); > {noformat} > The scheduleRedundancyRecovery method schedules a RecoveryRunnable if > necessary. > If that task hasn't fired yet, the InternalResourceManager doesn't close, and > the JVM stays up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6901) If a region is replicate and replicate persistent in different members and a replicate persistent member crashes, the replicate members throw a ToDataException attempting
Barry Oglesby created GEODE-6901: Summary: If a region is replicate and replicate persistent in different members and a replicate persistent member crashes, the replicate members throw a ToDataException attempting to synchronize the region Key: GEODE-6901 URL: https://issues.apache.org/jira/browse/GEODE-6901 Project: Geode Issue Type: Bug Components: persistence, regions Reporter: Barry Oglesby If a region is replicate and replicate persistent in different members and a replicate persistent member crashes, the replicate members throw a ToDataException attempting to synchronize the region In this case, an exception like this is thrown in the replicate member: {noformat} [warn 2019/06/21 17:06:33.516 PDT tid=0x2b] Timer task encountered exception org.apache.geode.ToDataException: class org.apache.geode.internal.cache.versions.VMRegionVersionVector at org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2331) at org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492) at org.apache.geode.internal.InternalDataSerializer.basicWriteObject(InternalDataSerializer.java:2067) at org.apache.geode.DataSerializer.writeObject(DataSerializer.java:2943) at org.apache.geode.internal.cache.InitialImageOperation$RequestImageMessage.toData(InitialImageOperation.java:2135) at org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2300) at org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492) at org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:242) at org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:385) at org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:241) at org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:596) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1711) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.send(GMSMembershipManager.java:1892) at org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2852) at org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:2779) at org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816) at org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1526) at org.apache.geode.internal.cache.InitialImageOperation.synchronizeWith(InitialImageOperation.java:649) at org.apache.geode.internal.cache.DistributedRegion.synchronizeWith(DistributedRegion.java:1321) at org.apache.geode.internal.cache.DistributedRegion.synchronizeForLostMember(DistributedRegion.java:1310) at org.apache.geode.internal.cache.DistributedRegion.performSynchronizeForLostMemberTask(DistributedRegion.java:1295) at org.apache.geode.internal.cache.DistributedRegion$1.run2(DistributedRegion.java:1285) at org.apache.geode.internal.SystemTimer$SystemTimerTask.run(SystemTimer.java:445) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Caused by: java.lang.ClassCastException: org.apache.geode.internal.cache.persistence.DiskStoreID cannot be cast to org.apache.geode.distributed.internal.membership.InternalDistributedMember at org.apache.geode.internal.cache.versions.VMRegionVersionVector.writeMember(VMRegionVersionVector.java:31) at org.apache.geode.internal.cache.versions.RegionVersionVector.toData(RegionVersionVector.java:1204) at org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2300) ... 24 more {noformat} RegionVersionVector.java:1204 is here: {noformat} for (Map.Entry> entry : this.memberToVersion.entrySet()) { -> writeMember(entry.getKey(), out); InternalDataSerializer.invokeToData(entry.getValue(), out); } {noformat} VMRegionVersionVector expects the entries of the memberToVersion to be keyed by InternalDistributedMembers: {noformat} protected void writeMember(InternalDistributedMember member, DataOutput out) throws IOException { {noformat} Logging in RegionVersionVector.toData shows the RegionVersionVector in this member is a VMRegionVersionVector and its memberToVersion map contains DiskStoreIDs. This causes the ClassCastException. {noformat} This RegionVersionVector's (class=VMRegionVersionVector) memberToVersion map contains the following 1 entries: member=402d383b29fa4c31-8597a3b72674bf5d; class=DiskStoreID {noformat} The documentation
[jira] [Resolved] (GEODE-6854) GatewaySender batch conflation can incorrectly conflate events causing out of order processing
[ https://issues.apache.org/jira/browse/GEODE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6854. -- Resolution: Fixed Fix Version/s: 1.10.0 > GatewaySender batch conflation can incorrectly conflate events causing out of > order processing > -- > > Key: GEODE-6854 > URL: https://issues.apache.org/jira/browse/GEODE-6854 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Fix For: 1.10.0 > > Time Spent: 2h > Remaining Estimate: 0h > > If a batch contains 2 equal update events, > {{AbstractGatewaySenderEventProcessor conflate}} will remove the original > event and add the later event at the end of the list. Depending on the other > events in the list, this could cause the batch to contain events that are out > of order. > For example, in this batch containing 6 events before conflation, the last > two events are duplicates of earlier events: > {noformat} > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > {noformat} > Conflating this batch results in these 4 events: > {noformat} > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > {noformat} > Notice the shadowKeys and sequenceIds are out of order after the conflation. > Conflation should produce this batch: > {noformat} > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] > {noformat} > This is similar to GEODE-4704, but not exactly the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events
[ https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863350#comment-16863350 ] Barry Oglesby commented on GEODE-6859: -- Here is some additional logging showing the behavior: The shadow PR for GatewaySender mysender is created: {noformat} [warn 2019/06/13 10:24:36.546 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR senderId=mysender; userPR=/test [warn 2019/06/13 10:24:36.546 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null [warn 2019/06/13 10:24:36.597 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=Partitioned Region @7951061f [path='/mysender_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=2; isDestroyed=false; isClosed=false; retryTimeout=360; serialNumber=125; partition attributes=PartitionAttributes@639507262[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null]; on VM 192.168.1.2(server:4637):41001] {noformat} The shadow PR for GatewaySender mysender2 is created: {noformat} [warn 2019/06/13 10:24:43.064 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR senderId=mysender2; userPR=/test [warn 2019/06/13 10:24:43.064 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null [warn 2019/06/13 10:24:43.069 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=Partitioned Region @1c5b3979 [path='/mysender2_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=3; isDestroyed=false; isClosed=false; retryTimeout=360; serialNumber=466; partition attributes=PartitionAttributes@635010394[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null]; on VM 192.168.1.2(server:4637):41001] {noformat} GatewaySender mysender is destroyed: {noformat} [warn 2019/06/13 10:24:43.889 PDT tid=0x3a] XXX AbstractGatewaySender.destroy region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE [warn 2019/06/13 10:24:43.889 PDT tid=0x3a] XXX PartitionedRegion.destroyRegion region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE {noformat} That causes PartitionedRegionDataStore.cleanUp to set shadowBucketDestroyed to true for all the buckets of the test region: {noformat} [warn 2019/06/13 10:24:43.890 PDT tid=0x3a] XXX PartitionedRegionDataStore.cleanUp region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE [warn 2019/06/13 10:24:43.895 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=0; destroyed=true [warn 2019/06/13 10:24:43.896 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=1; destroyed=true [warn 2019/06/13 10:24:43.897 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=2; destroyed=true [warn 2019/06/13 10:24:43.898 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=3; destroyed=true [warn 2019/06/13 10:24:43.899 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=4; destroyed=true [warn 2019/06/13 10:24:43.899 PDT tid=0x3a] ... [warn 2019/06/13 10:24:43.942 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=51; destroyed=true [warn 2019/06/13 10:24:43.942 PDT tid=0x3a] ... [warn 2019/06/13 10:24:43.959 PDT tid=0x3a] XXX PartitionedRegionDataStore.cleanUp complete region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE {noformat} The put is delivered to the ParallelGatewaySenderQueue, but shadowBucketDestroyed is true from the cleanUp above so the put is dropped: {noformat} [warn 2019/06/13 10:24:44.011 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.put brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51 [warn 2019/06/13 10:24:44.012 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.put brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51; shadowBucketDestroyed=true [warn 2019/06/13 10:24:44.012 PDT tid=0x3a] XXX ParallelGatewaySenderQueue.put not putting entry into queue as shadowPR bucket is destroyed: key=164; value=GatewaySenderEventImpl[id=EventID[id=24 bytes;threadID=0x1010033|1;sequenceID=122;bucketId=51];action=0;operation=CREATE;region=/test;key=3;value=3;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
[jira] [Commented] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events
[ https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862571#comment-16862571 ] Barry Oglesby commented on GEODE-6859: -- As a work-around, the region can be altered to remove the original sender before it is destroyed: {noformat} create gateway-sender --id=mysender --remote-distributed-system-id=2 --enable-persistence --parallel create region --name=test --gateway-sender-id=mysender --type=PARTITION_PERSISTENT create gateway-sender --id=mysender2 --remote-distributed-system-id=4 --enable-persistence --parallel -> alter region --name=test --gateway-sender-id='' destroy gateway-sender --id=mysender alter region --name=test --gateway-sender-id=mysender2 put --region=test --key="3" --value="3" list gateways {noformat} List gateway shows a queued event with this work-around: {noformat} GatewaySender Id | Member | Remote Cluster Id | Type | Status | Queued Events | Receiver Location | --- | - | | --- | - | - mysender2 | 192.168.1.2(server:34608):41001 | 4 | Parallel | Running | 1 | {noformat} > Destroying a parallel gateway sender attached to a region causes other > senders attached to that same region to no longer queue events > - > > Key: GEODE-6859 > URL: https://issues.apache.org/jira/browse/GEODE-6859 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Priority: Major > > This scenario causes the event to not be put into the queue: > - create gateway sender sender1 > - create region attached to sender1 > - create gateway sender sender2 > - alter region to be attached to sender2 > - destroy sender1 > - put an entry into region > Here are the steps using gfsh: > {noformat} > gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 > --enable-persistence --parallel > gfsh>create region --name=test --gateway-sender-id=mysender > --type=PARTITION_PERSISTENT > gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 > --enable-persistence --parallel > gfsh>alter region --name=test --gateway-sender-id=mysender2 > gfsh>destroy gateway-sender --id=mysender > gfsh>put --region=test --key="3" --value="3" > {noformat} > Debug logging shows: > {noformat} > [debug 2019/06/11 17:45:03.678 PDT tid=0x3a] > ParallelGatewaySenderOrderedQueue not putting key 164 : Value : > GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...] > as shadowPR bucket is destroyed. > {noformat} > It comes down to this call in ParallelGatewaySenderQueue.put: > {noformat} > thisbucketDestroyed = > ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor() > .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed(); > {noformat} > The first condition is true. > Here is a stack that shows where shadowBucketDestroyed is set to true: > {noformat} > [warn 2019/06/12 16:32:47.066 PDT tid=0x3a] > XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true > java.lang.Exception > at > org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417) > at > org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520) > at > org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376) > at > org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301) > at > org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630) > at > org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732) > at > org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299) > at > org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251) > at > org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077) > at > org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453) > at > org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599) > at > org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555) > at > org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60) > at >
[jira] [Updated] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events
[ https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6859: - Description: This scenario causes the event to not be put into the queue: - create gateway sender sender1 - create region attached to sender1 - create gateway sender sender2 - alter region to be attached to sender2 - destroy sender1 - put an entry into region Here are the steps using gfsh: {noformat} gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 --enable-persistence --parallel gfsh>create region --name=test --gateway-sender-id=mysender --type=PARTITION_PERSISTENT gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 --enable-persistence --parallel gfsh>alter region --name=test --gateway-sender-id=mysender2 gfsh>destroy gateway-sender --id=mysender gfsh>put --region=test --key="3" --value="3" {noformat} Debug logging shows: {noformat} [debug 2019/06/11 17:45:03.678 PDT tid=0x3a] ParallelGatewaySenderOrderedQueue not putting key 164 : Value : GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...] as shadowPR bucket is destroyed. {noformat} It comes down to this call in ParallelGatewaySenderQueue.put: {noformat} thisbucketDestroyed = ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor() .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed(); {noformat} The first condition is true. Here is a stack that shows where shadowBucketDestroyed is set to true: {noformat} [warn 2019/06/12 16:32:47.066 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true java.lang.Exception at org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820) at org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301) at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630) at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251) at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077) at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555) at org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60) at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960) at org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at java.lang.Thread.run(Thread.java:745) {noformat} PartitionedRegionDataStore.cleanUp is doing this: {noformat} // Fix for defect #49012 if (buk instanceof AbstractBucketRegionQueue && buk.getPartitionedRegion().isShadowPR()) { if (buk.getPartitionedRegion().getColocatedWithRegion() != null) { buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor() .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true); } } {noformat} The \{{buk.getPartitionedRegion().getColocatedWithRegion()}} is the data region. It can have more than one shadow region. So, either this code has to check whether there are other shadow regions before making the call to setShadowBucketDestroyed or the BucketAdvisor shadowBucketDestroyed has to be maintained per shadow region rather than be a single boolean. was: This scenario causes the event to not
[jira] [Updated] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events
[ https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6859: - Description: This scenario causes the event to not be put into the queue: - create gateway sender sender1 - create region attached to sender1 - create gateway sender sender2 - alter region to be attached to sender2 - destroy sender1 - put an entry into region Here are the steps using gfsh: {noformat} gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 --enable-persistence --parallel gfsh>create region --name=test --gateway-sender-id=mysender --type=PARTITION_PERSISTENT gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 --enable-persistence --parallel gfsh>alter region --name=test --gateway-sender-id=mysender2 gfsh>destroy gateway-sender --id=mysender gfsh>put --region=test --key="3" --value="3" {noformat} Debug logging shows: {noformat} [debug 2019/06/11 17:45:03.678 PDT tid=0x3a] ParallelGatewaySenderOrderedQueue not putting key 164 : Value : GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...] as shadowPR bucket is destroyed. {noformat} It comes down to this call in ParallelGatewaySenderQueue.put: {noformat} thisbucketDestroyed = ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor() .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed(); {noformat} The first condition is true. Here is a stack that shows where shadowBucketDestroyed is set to true: {noformat} [warn 2019/06/12 16:32:47.066 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true java.lang.Exception at org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820) at org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301) at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630) at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251) at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077) at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555) at org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60) at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960) at org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at java.lang.Thread.run(Thread.java:745) {noformat} PartitionedRegionDataStore.cleanUp is doing this: {noformat} // Fix for defect #49012 if (buk instanceof AbstractBucketRegionQueue && buk.getPartitionedRegion().isShadowPR()) { if (buk.getPartitionedRegion().getColocatedWithRegion() != null) { buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor() .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true); } } {noformat} The {{buk.getPartitionedRegion().getColocatedWithRegion() }}is the data region. It can have more than one shadow region. So, either this code has to check whether there are other shadow regions before making the call to setShadowBucketDestroyed or the BucketAdvisor shadowBucketDestroyed has to be maintained per shadow region rather than be a single boolean. was: This scenario causes the event to not
[jira] [Created] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events
Barry Oglesby created GEODE-6859: Summary: Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events Key: GEODE-6859 URL: https://issues.apache.org/jira/browse/GEODE-6859 Project: Geode Issue Type: Bug Components: wan Reporter: Barry Oglesby This scenario causes the event to not be put into the queue: - create gateway sender sender1 - create region attached to sender1 - create gateway sender sender2 - alter region to be attached to sender2 - destroy sender1 - put an entry into region Here are the steps using gfsh: {noformat} gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 --enable-persistence --parallel gfsh>create region --name=test --gateway-sender-id=mysender --type=PARTITION_PERSISTENT gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 --enable-persistence --parallel gfsh>alter region --name=test --gateway-sender-id=mysender2 gfsh>destroy gateway-sender --id=mysender gfsh>put --region=test --key="3" --value="3" {noformat} Debug logging shows: {noformat} [debug 2019/06/11 17:45:03.678 PDT tid=0x3a] ParallelGatewaySenderOrderedQueue not putting key 164 : Value : GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...] as shadowPR bucket is destroyed. {noformat} It comes down to this call in ParallelGatewaySenderQueue.put: {noformat} thisbucketDestroyed = ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor() .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed(); {noformat} The first condition is true. Here is a stack that shows where shadowBucketDestroyed is set to true: {noformat} [warn 2019/06/12 16:32:47.066 PDT tid=0x3a] XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true java.lang.Exception at org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820) at org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376) at org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301) at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630) at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251) at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077) at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555) at org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60) at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960) at org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at java.lang.Thread.run(Thread.java:745) {noformat} PartitionedRegionDataStore.cleanUp is doing this: {noformat} // Fix for defect #49012 if (buk instanceof AbstractBucketRegionQueue && buk.getPartitionedRegion().isShadowPR()) { if (buk.getPartitionedRegion().getColocatedWithRegion() != null) { buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor() .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true); } } {noformat} The \{{buk.getPartitionedRegion().getColocatedWithRegion()} is the data region. It can have more than one shadow region. So, either this code has to check whether there are other
[jira] [Assigned] (GEODE-6854) GatewaySender batch conflation can incorrectly conflate events causing out of order processing
[ https://issues.apache.org/jira/browse/GEODE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6854: Assignee: Barry Oglesby > GatewaySender batch conflation can incorrectly conflate events causing out of > order processing > -- > > Key: GEODE-6854 > URL: https://issues.apache.org/jira/browse/GEODE-6854 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > If a batch contains 2 equal update events, > {{AbstractGatewaySenderEventProcessor conflate}} will remove the original > event and add the later event at the end of the list. Depending on the other > events in the list, this could cause the batch to contain events that are out > of order. > For example, in this batch containing 6 events before conflation, the last > two events are duplicates of earlier events: > {noformat} > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > {noformat} > Conflating this batch results in these 4 events: > {noformat} > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > {noformat} > Notice the shadowKeys and sequenceIds are out of order after the conflation. > Conflation should produce this batch: > {noformat} > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] > SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] > SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] > {noformat} > This is similar to GEODE-4704, but not exactly the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6854) GatewaySender batch conflation can incorrectly conflate events causing out of order processing
Barry Oglesby created GEODE-6854: Summary: GatewaySender batch conflation can incorrectly conflate events causing out of order processing Key: GEODE-6854 URL: https://issues.apache.org/jira/browse/GEODE-6854 Project: Geode Issue Type: Bug Components: wan Reporter: Barry Oglesby If a batch contains 2 equal update events, {{AbstractGatewaySenderEventProcessor conflate}} will remove the original event and add the later event at the end of the list. Depending on the other events in the list, this could cause the batch to contain events that are out of order. For example, in this batch containing 6 events before conflation, the last two events are duplicates of earlier events: {noformat} SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] {noformat} Conflating this batch results in these 4 events: {noformat} SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] {noformat} Notice the shadowKeys and sequenceIds are out of order after the conflation. Conflation should produce this batch: {noformat} SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587] SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813] SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926] {noformat} This is similar to GEODE-4704, but not exactly the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6616) Flaky: AutoConnectionSourceDUnitTest > testClientDynamicallyDropsStoppedLocator FAILED
[ https://issues.apache.org/jira/browse/GEODE-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858170#comment-16858170 ] Barry Oglesby commented on GEODE-6616: -- This issue happened again: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/781] > Flaky: AutoConnectionSourceDUnitTest > > testClientDynamicallyDropsStoppedLocator FAILED > -- > > Key: GEODE-6616 > URL: https://issues.apache.org/jira/browse/GEODE-6616 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Mark Hanson >Priority: Minor > > Failed connection.. > {noformat} > [vm3] [info 2019/04/09 06:48:44.919 UTC > tid=0x20] Got result: EXCEPTION_OCCURRED > [vm3] org.apache.geode.cache.client.ServerOperationException: remote server > on 16f27a14ad79(255:loner):52816:5f2bdb00: : While performing a remote put > [vm3] at > org.apache.geode.cache.client.internal.PutOp$PutOpImpl.processAck(PutOp.java:389) > [vm3] at > org.apache.geode.cache.client.internal.PutOp$PutOpImpl.processResponse(PutOp.java:313) > [vm3] at > org.apache.geode.cache.client.internal.PutOp$PutOpImpl.attemptReadResponse(PutOp.java:454) > [vm3] at > org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:387) > [vm3] at > org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:289) > [vm3] at > org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:351) > [vm3] at > org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:908) > [vm3] at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:172) > [vm3] at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:130) > [vm3] at > org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:792) > [vm3] at > org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:90) > [vm3] at > org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:155) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3070) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3222) > [vm3] at > org.apache.geode.internal.cache.map.RegionMapPut.invokeCacheWriter(RegionMapPut.java:230) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:295) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPut(AbstractRegionMapPut.java:180) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWhileLockedForCacheModification(AbstractRegionMapPut.java:119) > [vm3] at > org.apache.geode.internal.cache.map.RegionMapPut.runWhileLockedForCacheModification(RegionMapPut.java:150) > [vm3] at > org.apache.geode.internal.cache.map.AbstractRegionMapPut.put(AbstractRegionMapPut.java:169) > [vm3] at > org.apache.geode.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2044) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5695) > [vm3] at > org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5123) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1652) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.lambda$put$3(LocalRegion.java:1638) > [vm3] at > io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:57) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1634) > [vm3] at > org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:425) > [vm3]
[jira] [Resolved] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock
[ https://issues.apache.org/jira/browse/GEODE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6821. -- Resolution: Fixed Fix Version/s: 1.10.0 > Multiple Serial GatewaySenders that are primary in different members can > cause a distributed deadlock > - > > Key: GEODE-6821 > URL: https://issues.apache.org/jira/browse/GEODE-6821 > Project: Geode > Issue Type: Bug > Components: messaging, wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Fix For: 1.10.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > A test with this scenario causes a distributed deadlock. > 3 servers each with: > - a function that performs a random region operation on the input region > - a replicated region on which the function is executed > - two regions each with a serial AEQ (the type of region could be either > replicate or partitioned) > 1 multi-threaded client that repeatedly executes the function with random > region names and operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6836) CI Failure: ReconnectDUnitTest.testReconnectWithRoleLoss fails with GemFireConfigException
Barry Oglesby created GEODE-6836: Summary: CI Failure: ReconnectDUnitTest.testReconnectWithRoleLoss fails with GemFireConfigException Key: GEODE-6836 URL: https://issues.apache.org/jira/browse/GEODE-6836 Project: Geode Issue Type: Bug Components: membership Reporter: Barry Oglesby DistributedTestOpenJDK8 build 775: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/775 Failure: {noformat} org.apache.geode.cache30.ReconnectDUnitTest > testReconnectWithRoleLoss FAILED java.lang.RuntimeException: org.apache.geode.GemFireConfigException: Unable to join the distributed system. Operation either timed out, was stopped or Locator does not exist. at org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.finishCacheXml(JUnit4CacheTestCase.java:176) at org.apache.geode.cache30.ReconnectDUnitTest.postSetUp(ReconnectDUnitTest.java:174) Caused by: org.apache.geode.GemFireConfigException: Unable to join the distributed system. Operation either timed out, was stopped or Locator does not exist. at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.join(GMSMembershipManager.java:663) at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.joinDistributedSystem(GMSMembershipManager.java:743) at org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:176) at org.apache.geode.distributed.internal.membership.gms.GMSMemberFactory.newMembershipManager(GMSMemberFactory.java:106) at org.apache.geode.distributed.internal.membership.MemberFactory.newMembershipManager(MemberFactory.java:93) at org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:781) at org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:899) at org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:541) at org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:756) at org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135) at org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:2997) at org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:251) at org.apache.geode.distributed.DistributedSystem.connect(DistributedSystem.java:158) at org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.getSystem(JUnit4DistributedTestCase.java:181) at org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.getSystem(JUnit4DistributedTestCase.java:257) at org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:118) at org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:104) at org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:100) at org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.finishCacheXml(JUnit4CacheTestCase.java:174) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6832) CI Failure: geode-assembly:test task failed with an EXCEPTION_ACCESS_VIOLATION
Barry Oglesby created GEODE-6832: Summary: CI Failure: geode-assembly:test task failed with an EXCEPTION_ACCESS_VIOLATION Key: GEODE-6832 URL: https://issues.apache.org/jira/browse/GEODE-6832 Project: Geode Issue Type: Bug Reporter: Barry Oglesby WindowsUnitTestOpenJDK11 build 552 Task :geode-assembly:test failed with an EXCEPTION_ACCESS_VIOLATION. https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsUnitTestOpenJDK11/builds/552 Failure: {noformat} > Task :geode-assembly:test # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x022a2862e835, pid=4916, tid=5752 # # JRE version: OpenJDK Runtime Environment (11.0.3+7) (build 11.0.3+7) # Java VM: OpenJDK 64-Bit Server VM (11.0.3+7, mixed mode, tiered, compressed oops, g1 gc, windows-amd64) # Problematic frame: # j net.bytebuddy.description.type.TypeList$Generic$ForDetachedTypes$OfTypeVariables.size()I+4 # # Core dump will be written. Default location: C:\Users\geode\geode\geode-assembly\build\test\hs_err_pid4916.mdmp # # An error report file with more information is saved as: # C:\Users\geode\geode\geode-assembly\build\test\hs_err_pid4916.log Compiled method (c1) 11957 2157 3 net.bytebuddy.description.type.TypeList$Generic$AbstractBase:: (5 bytes) total in heap [0x022a29176810,0x022a29176da8] = 1432 relocation [0x022a29176988,0x022a291769d0] = 72 main code [0x022a291769e0,0x022a29176c20] = 576 stub code [0x022a29176c20,0x022a29176cb8] = 152 oops [0x022a29176cb8,0x022a29176cc0] = 8 metadata [0x022a29176cc0,0x022a29176cf0] = 48 scopes data [0x022a29176cf0,0x022a29176d30] = 64 scopes pcs [0x022a29176d30,0x022a29176da0] = 112 dependencies [0x022a29176da0,0x022a29176da8] = 8 Could not load hsdis-amd64.dll; library not loadable; PrintAssembly is disabled # # If you would like to submit a bug report, please visit: # https://github.com/AdoptOpenJDK/openjdk-build/issues # WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.mockito.internal.util.reflection.AccessibilityChanger (file:/C:/Users/geode/.gradle/caches/modules-2/files-2.1/org.mockito/mockito-core/2.23.0/497ddb32fd5d01f9dbe99a2ec790aeb931dff1b1/mockito-core-2.23.0.jar) to field java.io.File.path WARNING: Please consider reporting this to the maintainers of org.mockito.internal.util.reflection.AccessibilityChanger WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Unexpected exception thrown. org.gradle.internal.remote.internal.MessageIOException: Could not write '/127.0.0.1:50173'. at org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:135) at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionDispatch.run(MessageHub.java:325) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeWithNonBlockingRetry(SocketConnection.java:273) at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeBufferToChannel(SocketConnection.java:261) at org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.flush(SocketConnection.java:255) at org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:133) ... 7 more > Task :geode-assembly:test FAILED {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock
[ https://issues.apache.org/jira/browse/GEODE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852457#comment-16852457 ] Barry Oglesby commented on GEODE-6821: -- The key to this deadlock is a shared P2P message reader waiting in SerialGatewaySenderQueue.put for a WriteLock like: {noformat} "P2P message reader for 192.168.1.2(server-3:54808):41005 shared ordered uid=6 port=62566" tid=0x4e owned by "Function Execution Processor7" tid=0x66 java.lang.Thread.State: WAITING at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@42117a4e at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.put(SerialGatewaySenderQueue.java:220) at org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.queuePrimaryEvent(SerialGatewaySenderEventProcessor.java:477) at org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.enqueueEvent(SerialGatewaySenderEventProcessor.java:445) at org.apache.geode.internal.cache.wan.AbstractGatewaySender.distribute(AbstractGatewaySender.java:1033) at org.apache.geode.internal.cache.LocalRegion.notifyGatewaySender(LocalRegion.java:6138) at org.apache.geode.internal.cache.LocalRegion.basicPutPart2(LocalRegion.java:5768) at org.apache.geode.internal.cache.map.RegionMapPut.doBeforeCompletionActions(RegionMapPut.java:282) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutAndDeliverEvent(AbstractRegionMapPut.java:301) at org.apache.geode.internal.cache.map.AbstractRegionMapPut$$Lambda$163/1504099933.run(Unknown Source) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWithIndexUpdatingInProgress(AbstractRegionMapPut.java:308) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:296) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273) - locked org.apache.geode.internal.cache.entries.VersionedThinRegionEntryHeapIntKey@7bbbc992 at org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251) - locked org.apache.geode.internal.cache.entries.VersionedThinRegionEntryHeapIntKey@7bbbc992 at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216) at org.apache.geode.internal.cache.map.AbstractRegionMapPut$$Lambda$162/754294637.run(Unknown Source) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPut(AbstractRegionMapPut.java:180) at org.apache.geode.internal.cache.map.AbstractRegionMapPut$$Lambda$161/453331027.run(Unknown Source) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWhileLockedForCacheModification(AbstractRegionMapPut.java:119) at org.apache.geode.internal.cache.map.RegionMapPut.runWhileLockedForCacheModification(RegionMapPut.java:161) at org.apache.geode.internal.cache.map.AbstractRegionMapPut.put(AbstractRegionMapPut.java:169) at org.apache.geode.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2044) at org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5599) at org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:377) at org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162) at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5579) at org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:150) at org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:285) at org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:256) at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1200) at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1100) at
[jira] [Assigned] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock
[ https://issues.apache.org/jira/browse/GEODE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6821: Assignee: Barry Oglesby > Multiple Serial GatewaySenders that are primary in different members can > cause a distributed deadlock > - > > Key: GEODE-6821 > URL: https://issues.apache.org/jira/browse/GEODE-6821 > Project: Geode > Issue Type: Bug > Components: messaging, wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > A test with this scenario causes a distributed deadlock. > 3 servers each with: > - a function that performs a random region operation on the input region > - a replicated region on which the function is executed > - two regions each with a serial AEQ (the type of region could be either > replicate or partitioned) > 1 multi-threaded client that repeatedly executes the function with random > region names and operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock
Barry Oglesby created GEODE-6821: Summary: Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock Key: GEODE-6821 URL: https://issues.apache.org/jira/browse/GEODE-6821 Project: Geode Issue Type: Bug Components: messaging, wan Reporter: Barry Oglesby A test with this scenario causes a distributed deadlock. 3 servers each with: - a function that performs a random region operation on the input region - a replicated region on which the function is executed - two regions each with a serial AEQ (the type of region could be either replicate or partitioned) 1 multi-threaded client that repeatedly executes the function with random region names and operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GEODE-6748) Client invalidate operations never use single hop
[ https://issues.apache.org/jira/browse/GEODE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6748: - Affects Version/s: 1.9.0 > Client invalidate operations never use single hop > - > > Key: GEODE-6748 > URL: https://issues.apache.org/jira/browse/GEODE-6748 > Project: Geode > Issue Type: Bug > Components: client/server >Affects Versions: 1.9.0 >Reporter: Barry Oglesby >Priority: Major > > InvalidateOp.execute does: > {noformat} > public static void execute(ExecutablePool pool, String region, EntryEventImpl > event) { > AbstractOp op = new InvalidateOpImpl(region, event); > pool.execute(op); > }{noformat} > That is the non-single-hop way of executing an operation. > It should use the single-hop way of executing an operation if > pr-single-hop-enabled=true. > The execute methods in PutOp and GetOp show examples of that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6748) Client invalidate operations never use single hop
Barry Oglesby created GEODE-6748: Summary: Client invalidate operations never use single hop Key: GEODE-6748 URL: https://issues.apache.org/jira/browse/GEODE-6748 Project: Geode Issue Type: Bug Components: client/server Reporter: Barry Oglesby InvalidateOp.execute does: {noformat} public static void execute(ExecutablePool pool, String region, EntryEventImpl event) { AbstractOp op = new InvalidateOpImpl(region, event); pool.execute(op); }{noformat} That is the non-single-hop way of executing an operation. It should use the single-hop way of executing an operation if pr-single-hop-enabled=true. The execute methods in PutOp and GetOp show examples of that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-6186) Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch processing with conflation enabled
[ https://issues.apache.org/jira/browse/GEODE-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6186. -- Resolution: Fixed Fix Version/s: 1.9.0 > Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch > processing with conflation enabled > > > Key: GEODE-6186 > URL: https://issues.apache.org/jira/browse/GEODE-6186 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch > processing with conflation enabled > This test: > 3000 iterations of putAlls with the same 1500 keys into a partitioned region > attached to async-event-queue: > dispatcher-threads="1" parallel="true" enable-batch-conflation="true"> > Produces these numbers in the current code (4 different runs): > {noformat} > numBatches=645; numENFEs=8622196; totalPeekTime=178517; averagePeekTime=276; > totalProcessBatchTime=38936; averageProcessBatchTime=60 > numBatches=660; numENFEs=8467986; totalPeekTime=182985; averagePeekTime=277; > totalProcessBatchTime=34335; averageProcessBatchTime=52 > numBatches=646; numENFEs=8563364; totalPeekTime=179624; averagePeekTime=278; > totalProcessBatchTime=37342; averageProcessBatchTime=57 > numBatches=632; numENFEs=8716942; totalPeekTime=175570; averagePeekTime=277; > totalProcessBatchTime=39732; averageProcessBatchTime=62 > {noformat} > After some changes mainly in BucketRegionQueue: > {noformat} > numBatches=782; numENFEs=3621039; totalPeekTime=195760; averagePeekTime=250; > totalProcessBatchTime=18724; averageProcessBatchTime=23 > numBatches=791; numENFEs=3604933; totalPeekTime=197980; averagePeekTime=250; > totalProcessBatchTime=18587; averageProcessBatchTime=23 > numBatches=790; numENFEs=3600038; totalPeekTime=197774; averagePeekTime=250; > totalProcessBatchTime=18611; averageProcessBatchTime=23 > numBatches=795; numENFEs=3584490; totalPeekTime=199060; averagePeekTime=250; > totalProcessBatchTime=18063; averageProcessBatchTime=22 > {noformat} > numBatches is the number of batches peeked > numENFEs is the number of EntryNotFoundExceptions thrown > totalPeekTime is the total time to peek all batches > averagePeekTime is the average time to peek a batch > totalProcessBatchTime is the total time to process all batches > averageProcessBatchTime is the average time to process a batch (includes > listener callback and remove from queue) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6572) MemberMBeanAttributesDUnitTest testConfigAttributes failed with SIGSEGV
Barry Oglesby created GEODE-6572: Summary: MemberMBeanAttributesDUnitTest testConfigAttributes failed with SIGSEGV Key: GEODE-6572 URL: https://issues.apache.org/jira/browse/GEODE-6572 Project: Geode Issue Type: Bug Components: jmx Reporter: Barry Oglesby The MemberMBeanAttributesDUnitTest testConfigAttributes test failed in DistributedTestOpenJDK11 CI run 547: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/547 With this exception: {noformat} org.apache.geode.management.MemberMBeanAttributesDUnitTest > testConfigAttributes FAILED org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties$$Lambda$161/0x000840231c40.run in VM 0 running on Host f32ed46bb850 with 4 VMs Caused by: java.rmi.ConnectException: Connection refused to host: 172.17.0.13; nested exception is: java.net.ConnectException: Connection refused (Connection refused) Caused by: java.net.ConnectException: Connection refused (Connection refused) {noformat} The failure is actually a SIGSEGV which is logged in the MemberMBeanAttributesDUnitTest.html file: {noformat} [vm0] # [vm0] # A fatal error has been detected by the Java Runtime Environment: [vm0] # [vm0] # SIGSEGV (0xb) at pc=0x7f38b7abe1a0, pid=153, tid=222 [vm0] # [vm0] # JRE version: OpenJDK Runtime Environment (11.0.3+1) (build 11.0.3+1-Debian-1) [vm0] # Java VM: OpenJDK 64-Bit Server VM (11.0.3+1-Debian-1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) [vm0] # Problematic frame: [vm0] # V [libjvm.so+0x5271a0] [vm0] # [vm0] # Core dump will be written. Default location: /home/geode/geode/geode-core/build/distributedTest421/dunit/vm0/core [vm0] # [vm0] # An error report file with more information is saved as: [vm0] # /home/geode/geode/geode-core/build/distributedTest421/dunit/vm0/hs_err_pid153.log [vm0] # [vm0] # Compiler replay data is saved as: [vm0] # /home/geode/geode/geode-core/build/distributedTest421/dunit/vm0/replay_pid153.log [vm0] # [vm0] # If you would like to submit a bug report, please visit: [vm0] # http://bugreport.java.com/bugreport/crash.jsp [vm0] # {noformat} Unfortunately, I don't see the hs_err_pid153.log file in the tar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6566) CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed with suspect string
[ https://issues.apache.org/jira/browse/GEODE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803131#comment-16803131 ] Barry Oglesby commented on GEODE-6566: -- Normally, SampleCollector.sample is only invoked by the StatSampler thread. It is the only one that iterates the resourceInstMap. In this test, that thread is running normally. In addition, the test calls SampleCollector.sample. This now means two threads are executing that method and potentially iterating the resourceInstMap simultaneously. This is the normal StatSampler thread: {noformat} [vm2] [warning 2019/03/27 10:35:14.150 PDT tid=45] SampleCollector.sample invoked [vm2] java.lang.Exception [vm2] at org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:219) [vm2] at org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232) [vm2] at java.lang.Thread.run(Thread.java:745) {noformat} This is the test method thread: {noformat} [vm2] [warning 2019/03/27 10:35:15.285 PDT tid=19] SampleCollector.sample invoked [vm2] java.lang.Exception [vm2] at org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:219) [vm2] at org.apache.geode.management.MemberMBeanAttributesDUnitTest.lambda$sampleStatistics$b6506259$1(MemberMBeanAttributesDUnitTest.java:89) {noformat} > CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed > with suspect string > -- > > Key: GEODE-6566 > URL: https://issues.apache.org/jira/browse/GEODE-6566 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Barry Oglesby >Priority: Major > > CI run: > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/543 > Suspect strings: > {noformat} > org.apache.geode.management.MemberMBeanAttributesDUnitTest > > testReplRegionAttributes FAILED > java.lang.AssertionError: Suspicious strings were written to the log during > this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in log4j at line 2383 > [fatal 2019/03/27 01:09:04.965 UTC tid=195] null > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) > at java.util.HashMap$ValueIterator.next(HashMap.java:1471) > at > org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231) > at > org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232) > at java.lang.Thread.run(Thread.java:748) > --- > Found suspect string in log4j at line 2396 > [fatal 2019/03/27 01:09:04.972 UTC tid=195] Uncaught exception > in thread Thread[StatSampler,10,RMI Runtime] > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) > at java.util.HashMap$ValueIterator.next(HashMap.java:1471) > at > org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231) > at > org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6566) CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed with suspect string
Barry Oglesby created GEODE-6566: Summary: CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed with suspect string Key: GEODE-6566 URL: https://issues.apache.org/jira/browse/GEODE-6566 Project: Geode Issue Type: Bug Components: tests Reporter: Barry Oglesby CI run: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/543 Suspect strings: {noformat} org.apache.geode.management.MemberMBeanAttributesDUnitTest > testReplRegionAttributes FAILED java.lang.AssertionError: Suspicious strings were written to the log during this run. Fix the strings or use IgnoredException.addIgnoredException to ignore. --- Found suspect string in log4j at line 2383 [fatal 2019/03/27 01:09:04.965 UTC tid=195] null java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) at java.util.HashMap$ValueIterator.next(HashMap.java:1471) at org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231) at org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232) at java.lang.Thread.run(Thread.java:748) --- Found suspect string in log4j at line 2396 [fatal 2019/03/27 01:09:04.972 UTC tid=195] Uncaught exception in thread Thread[StatSampler,10,RMI Runtime] java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) at java.util.HashMap$ValueIterator.next(HashMap.java:1471) at org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231) at org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6564) Clearing a replicated region with expiration causes a memory leak
Barry Oglesby created GEODE-6564: Summary: Clearing a replicated region with expiration causes a memory leak Key: GEODE-6564 URL: https://issues.apache.org/jira/browse/GEODE-6564 Project: Geode Issue Type: Bug Components: regions Reporter: Barry Oglesby Clearing a replicated region with expiration causes a memory leak Both the RegionEntries and EntryExpiryTasks are still live after loading entries into the region and then clearing it. Server Startup: {noformat} num #instances #bytes class name -- 1: 29856 2797840 [C 4: 2038 520600 [B Total 187711 10089624 {noformat} Load 100 entries with 600k payload (representing a session): {noformat} num #instances #bytes class name -- 1: 2496 60666440 [B 2: 30157 2828496 [C 73: 100 7200 org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1 93: 100 4800 org.apache.geode.internal.cache.EntryExpiryTask Total 190737 70240472 {noformat} Clear region: {noformat} num #instances #bytes class name -- 1: 2398 60505944 [B 2: 30448 2849456 [C 74: 100 7200 org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1 100: 100 4800 org.apache.geode.internal.cache.EntryExpiryTask Total 192199 70373048 {noformat} Load and clear another 100 entries: {noformat} num #instances #bytes class name -- 1: 2503 120511688 [B 2: 30506 2854384 [C 46: 200 14400 org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1 61: 200 9600 org.apache.geode.internal.cache.EntryExpiryTask Total 193272 130421432 {noformat} Load and clear another 100 entries: {noformat} num #instances #bytes class name -- 1: 2600 180517240 [B 2: 30562 2859584 [C 33: 300 21600 org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1 47: 300 14400 org.apache.geode.internal.cache.EntryExpiryTask Total 194310 190468176 {noformat} A heap dump shows the VersionedStatsRegionEntryHeapStringKey1 instances are referenced by the DistributedRegion entryExpiryTasks: {noformat} --> org.apache.geode.internal.cache.DistributedRegion@0x76adbbb88 (816 bytes) (field entryExpiryTasks:) --> java.util.concurrent.ConcurrentHashMap@0x76adbc028 (100 bytes) (field table:) --> [Ljava.util.concurrent.ConcurrentHashMap$Node;@0x76ee85358 (4112 bytes) (Element 276 of [Ljava.util.concurrent.ConcurrentHashMap$Node;@0x76ee85358:) --> java.util.concurrent.ConcurrentHashMap$Node@0x76edc4e20 (44 bytes) (field next:) --> java.util.concurrent.ConcurrentHashMap$Node@0x76edc32f0 (44 bytes) (field key:) --> org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1@0x76edc3210 (86 bytes) {noformat} LocalRegion.cancelAllEntryExpiryTasks is called when the region is cleared: {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.internal.cache.LocalRegion.cancelAllEntryExpiryTasks(LocalRegion.java:8202) at org.apache.geode.internal.cache.LocalRegion.clearRegionLocally(LocalRegion.java:9094) at org.apache.geode.internal.cache.DistributedRegion.cmnClearRegion(DistributedRegion.java:1962) at org.apache.geode.internal.cache.LocalRegion.basicClear(LocalRegion.java:8998) at org.apache.geode.internal.cache.DistributedRegion.basicClear(DistributedRegion.java:1939) at org.apache.geode.internal.cache.LocalRegion.basicBridgeClear(LocalRegion.java:8988) at org.apache.geode.internal.cache.tier.sockets.command.ClearRegion.cmdExecute(ClearRegion.java:123) {noformat} But it doesn't clear the entryExpiryTasks map: {noformat} LocalRegion.clearRegionLocally before cancelAllEntryExpiryTasks entryExpiryTasks=100 LocalRegion.clearRegionLocally after cancelAllEntryExpiryTasks entryExpiryTasks=100 {noformat} As a test, I added this call to the bottom of the cancelAllEntryExpiryTasks method: {noformat} this.entryExpiryTasks.clear(); {noformat} This addressed the leak: {noformat} Server Startup: Total 182414 9855616 Load/Clear 1: Total 191049 10315832 Load/Clear 2: Total 191978 10329664 Load/Clear 3: Total 192638 10360360 {noformat} As a work-around, a Function that clears the region by using removeAll on batches of keys also addresses the leak: {noformat} Server Startup: Total 182297 9849312 Load/Clear 1: Total 185932 10019248 Load/Clear 2: Total 191855 10278816 Load/Clear 3: Total 192511 10313168 Load/Clear 4: Total 193424 10352008 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6376) PersistentRecoveryOrderDUnitTest > testCrashDuringPreparePersistentId FAILED
[ https://issues.apache.org/jira/browse/GEODE-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788317#comment-16788317 ] Barry Oglesby commented on GEODE-6376: -- Re-occurred in CI: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/484] {noformat} org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest > testCrashDuringPreparePersistentId FAILED java.lang.RuntimeException: java.lang.IllegalStateException: Disk store PersistentRecoveryOrderDUnitTest_testCrashDuringPreparePersistentIdRegion not found at org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase._createPersistentRegion(PersistentReplicatedTestBase.java:194) at org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase.createPersistentRegion(PersistentReplicatedTestBase.java:180) at org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.testCrashDuringPreparePersistentId(PersistentRecoveryOrderDUnitTest.java:1325) Caused by: java.lang.IllegalStateException: Disk store PersistentRecoveryOrderDUnitTest_testCrashDuringPreparePersistentIdRegion not found 8346 tests completed, 1 failed, 495 skipped {noformat} =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0034/test-results/distributedTest/1552078010/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0034/test-artifacts/1552078010/distributedtestfiles-OpenJDK11-1.10.0-SNAPSHOT.0034.tgz > PersistentRecoveryOrderDUnitTest > testCrashDuringPreparePersistentId FAILED > > > Key: GEODE-6376 > URL: https://issues.apache.org/jira/browse/GEODE-6376 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Mark Hanson >Priority: Major > > Failure Link > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/373 > Log Archives: > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0412/test-results/distributedTest/1549403523/ > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0412/test-artifacts/1549403523/distributedtestfiles-OpenJDK11-1.9.0-SNAPSHOT.0412.tgz > Stack Trace: > {code} > java.lang.RuntimeException: java.lang.IllegalStateException: Disk store > PersistentRecoveryOrderDUnitTest_testCrashDuringPreparePersistentIdRegion not > found > at > org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase._createPersistentRegion(PersistentReplicatedTestBase.java:194) > at > org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase.createPersistentRegion(PersistentReplicatedTestBase.java:180) > at > org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.testCrashDuringPreparePersistentId(PersistentRecoveryOrderDUnitTest.java:1325) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at
[jira] [Commented] (GEODE-4263) GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. testRMAndTimeoutSet
[ https://issues.apache.org/jira/browse/GEODE-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788271#comment-16788271 ] Barry Oglesby commented on GEODE-4263: -- ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout: - creates a DefaultQuery test hook: - asynchronously invokes the query (which calls doTestHook to start the test hook waiting) - sleeps 1 second - simulates a CRITICAL_HEAP_USED event - sleeps another 4 seconds - releases the test hook Meanwhile, the test hook has been waiting for 8 seconds. The above steps didn't happen in 8 seconds, so the test failed with the 'query was never unlatched' message. Maybe 8 seconds isn't long enough to wait. > GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. > testRMAndTimeoutSet > --- > > Key: GEODE-4263 > URL: https://issues.apache.org/jira/browse/GEODE-4263 > Project: Geode > Issue Type: Bug > Components: querying >Reporter: nabarun >Priority: Major > > {noformat} > java.lang.AssertionError: queryExecution.getResult() threw Exception > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:738) > at > org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTest(ResourceManagerWithQueryMonitorDUnitTest.java:321) > at > org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testRMAndTimeoutSet(ResourceManagerWithQueryMonitorDUnitTest.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at >
[jira] [Reopened] (GEODE-4263) GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. testRMAndTimeoutSet
[ https://issues.apache.org/jira/browse/GEODE-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reopened GEODE-4263: -- Reopening this JIRA since the ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout method failed in CI run: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/483 {noformat} java.lang.AssertionError: queryExecution.getResult() threw Exception java.lang.AssertionError: An exception occurred during asynchronous invocation. at org.junit.Assert.fail(Assert.java:88) at org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:847) at org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTestWithMultipleServers(ResourceManagerWithQueryMonitorDUnitTest.java:628) at org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testPRGatherCancellation(ResourceManagerWithQueryMonitorDUnitTest.java:249) java.lang.AssertionError: Suspicious strings were written to the log during this run. Fix the strings or use IgnoredException.addIgnoredException to ignore. --- Found suspect string in log4j at line 2187 [fatal 2019/03/08 18:40:17.385 UTC tid=162] Server connection from [identity(172.17.0.5(283:loner):43932:82f99a5e,connection=1; port=43932] : Unexpected Error on server java.lang.AssertionError: query was never unlatched at org.junit.Assert.fail(Assert.java:88) at org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest$PauseTestHook.doTestHook(ResourceManagerWithQueryMonitorDUnitTest.java:1304) at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:257) at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:217) at org.apache.geode.internal.cache.tier.sockets.BaseCommandQuery.processQueryUsingParams(BaseCommandQuery.java:105) at org.apache.geode.internal.cache.tier.sockets.BaseCommandQuery.processQuery(BaseCommandQuery.java:58) at org.apache.geode.internal.cache.tier.sockets.command.Query.cmdExecute(Query.java:94) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:851) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:75) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:616) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at java.lang.Thread.run(Thread.java:748) 8379 tests completed, 1 failed, 499 skipped {noformat} =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0033/test-results/distributedTest/1552074270/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0033/test-artifacts/1552074270/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0033.tgz > GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. > testRMAndTimeoutSet > --- > > Key: GEODE-4263 > URL: https://issues.apache.org/jira/browse/GEODE-4263 > Project: Geode > Issue Type: Bug > Components: querying >Reporter: nabarun >Priority: Major > > {noformat} > java.lang.AssertionError: queryExecution.getResult() threw Exception > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:738) > at > org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTest(ResourceManagerWithQueryMonitorDUnitTest.java:321) > at > org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testRMAndTimeoutSet(ResourceManagerWithQueryMonitorDUnitTest.java:157) > at
[jira] [Commented] (GEODE-6498) CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure
[ https://issues.apache.org/jira/browse/GEODE-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788073#comment-16788073 ] Barry Oglesby commented on GEODE-6498: -- This re-occurred in: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK11/builds/322] {noformat} org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest > describeClient FAILED org.junit.ComparisonFailure: expected:<"10.0.0.93([locator-0:1868:locator):41002 [Coordinator]]"> but was:<"10.0.0.93([server-1:7776):41004]"> at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152) at org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101) 10 tests completed, 1 failed {noformat} =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0027/test-results/distributedTest/1552027081/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0027/test-artifacts/1552027081/windows-gfshdistributedtest-OpenJDK11-1.10.0-SNAPSHOT.0027.tgz > CI failure: DescribeClientCommandDUnitTest describeClient failed with > ComparisonFailure > --- > > Key: GEODE-6498 > URL: https://issues.apache.org/jira/browse/GEODE-6498 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Assignee: Jinmei Liao >Priority: Major > > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/318 > {noformat} > org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest > > describeClient FAILED > org.junit.ComparisonFailure: > expected:<"10.0.0.148(server-[2:8068):41004]"> but > was:<"10.0.0.148(server-[1:1656):41006]"> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152) > at > org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101) > 10 tests completed, 1 failed > {noformat} > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-results/distributedTest/1551991519/ > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-artifacts/1551991519/windows-gfshdistributedtest-OpenJDK8-1.10.0-SNAPSHOT.0024.tgz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (GEODE-6498) CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure
[ https://issues.apache.org/jira/browse/GEODE-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6498: Assignee: Jinmei Liao > CI failure: DescribeClientCommandDUnitTest describeClient failed with > ComparisonFailure > --- > > Key: GEODE-6498 > URL: https://issues.apache.org/jira/browse/GEODE-6498 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Assignee: Jinmei Liao >Priority: Major > > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/318 > {noformat} > org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest > > describeClient FAILED > org.junit.ComparisonFailure: > expected:<"10.0.0.148(server-[2:8068):41004]"> but > was:<"10.0.0.148(server-[1:1656):41006]"> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152) > at > org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101) > 10 tests completed, 1 failed > {noformat} > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-results/distributedTest/1551991519/ > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-artifacts/1551991519/windows-gfshdistributedtest-OpenJDK8-1.10.0-SNAPSHOT.0024.tgz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6498) CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure
Barry Oglesby created GEODE-6498: Summary: CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure Key: GEODE-6498 URL: https://issues.apache.org/jira/browse/GEODE-6498 Project: Geode Issue Type: Bug Components: gfsh Reporter: Barry Oglesby https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/318 {noformat} org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest > describeClient FAILED org.junit.ComparisonFailure: expected:<"10.0.0.148(server-[2:8068):41004]"> but was:<"10.0.0.148(server-[1:1656):41006]"> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152) at org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101) 10 tests completed, 1 failed {noformat} =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-results/distributedTest/1551991519/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-artifacts/1551991519/windows-gfshdistributedtest-OpenJDK8-1.10.0-SNAPSHOT.0024.tgz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6419) CI Failure: ClusterConfigurationDUnitTest.testStartServerAndExecuteCommands fails with BindException
[ https://issues.apache.org/jira/browse/GEODE-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787224#comment-16787224 ] Barry Oglesby commented on GEODE-6419: -- This failure reproduced during CI: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK11/builds/319 {noformat} org.apache.geode.management.internal.cli.commands.ClusterConfigurationDUnitTest > testStartServerAndExecuteCommands[0] FAILED java.lang.AssertionError: Suspicious strings were written to the log during this run. Fix the strings or use IgnoredException.addIgnoredException to ignore. --- Found suspect string in log4j at line 1141 [error 2019/03/07 19:32:24.218 GMT tid=64] Jmx manager could not be started because HTTP service failed to start org.apache.geode.management.ManagementException: HTTP service failed to start at org.apache.geode.management.internal.ManagementAgent.loadWebApplications(ManagementAgent.java:240) at org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:127) at org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432) at org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181) at org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:127) at org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2162) at org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:704) at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1182) at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:181) at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:147) at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:138) at org.apache.geode.distributed.internal.InternalLocator.startCache(InternalLocator.java:672) at org.apache.geode.distributed.internal.InternalLocator.startDistributedSystem(InternalLocator.java:659) at org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:343) at org.apache.geode.distributed.Locator.startLocator(Locator.java:252) at org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:139) at org.apache.geode.test.junit.rules.LocatorStarterRule.startLocator(LocatorStarterRule.java:85) at org.apache.geode.test.junit.rules.LocatorStarterRule.before(LocatorStarterRule.java:66) at org.apache.geode.test.dunit.rules.ClusterStartupRule.lambda$startLocatorVM$22d9b8a8$1(ClusterStartupRule.java:239) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:359) at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200) at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196) at java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:562) at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:796) at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:677) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:676) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.net.BindException: Address already in use: bind at
[jira] [Resolved] (GEODE-2968) Provide an API to set identity field(s) on JSON objects
[ https://issues.apache.org/jira/browse/GEODE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-2968. -- Resolution: Duplicate > Provide an API to set identity field(s) on JSON objects > --- > > Key: GEODE-2968 > URL: https://issues.apache.org/jira/browse/GEODE-2968 > Project: Geode > Issue Type: Improvement > Components: rest (dev), serialization >Reporter: Barry Oglesby >Priority: Major > > I have a JSON object with 53 fields. The identity of that object is one > specific field (the {{Unique_Key}} field in this case), but I can't specify > that when loading the object. This causes {{PdxInstanceImpl equals}} and > {{hashCode}} to use all 53 fields in their determinations and is especially > bad for OQL queries. > I hacked {{PdxInstanceHelper addIntField}} to set an identity field like: > {noformat} > if (fieldName.equals("Unique_Key")) { > m_pdxInstanceFactory.markIdentityField(fieldName); > } > {noformat} > Here are some queries before and after this change: > Before: > {noformat} > Totals query=SELECT * FROM /data WHERE Agency = 'NYPD'; resultSize=1890; > iterations=1000; totalTime=30529 ms; averagePerQuery=30.529 ms > Totals query=SELECT * FROM /data WHERE Incident_Address LIKE '%AVENUE%'; > resultSize=2930; iterations=1000; totalTime=62723 ms; averagePerQuery=62.723 > ms > Totals query=SELECT * FROM /data; resultSize=1; iterations=1000; > totalTime=87673 ms; averagePerQuery=87.673 ms > {noformat} > After: > {noformat} > Totals query=SELECT * FROM /data WHERE Agency = 'NYPD'; resultSize=1890; > iterations=1000; totalTime=12417 ms; averagePerQuery=12.417 ms > Totals query=SELECT * FROM /data WHERE Incident_Address LIKE '%AVENUE%'; > resultSize=2930; iterations=1000; totalTime=29517 ms; averagePerQuery=29.517 > ms > Totals query=SELECT * FROM /data; resultSize=1; iterations=1000; > totalTime=44127 ms; averagePerQuery=44.127 ms > {noformat} > Here is an example of the JSON object: > {noformat} > { >"Unique_Key": 25419013, >"Created_Date": "04/24/2013 12:00:00 AM", >"Closed_Date": "04/25/2013 12:00:00 AM", >"Agency": "HPD", >"Agency_Name": "Department of Housing Preservation and Development", >"Complaint_Type": "PLUMBING", >"Descriptor": "WATER-SUPPLY", >"Location_Type": "RESIDENTIAL BUILDING", >"Incident_Zip": "11372", >"Incident_Address": "37-37 88 STREET", >"Street_Name": "88 STREET", >"Cross_Street_1": "37 AVENUE", >"Cross_Street_2": "ROOSEVELT AVENUE", >"Intersection_Street_1": "", >"Intersection_Street_2": "", >"Address_Type": "ADDRESS", >"City": "Jackson Heights", >"Landmark": "", >"Facility_Type": "N/A", >"Status": "Closed", >"Due_Date": "", >"Resolution_Description": "The Department of Housing Preservation and > Development inspected the following conditions. No violations were issued. > The complaint has been closed.", >"Resolution_Action_Updated_Date": "04/25/2013 12:00:00 AM", >"Community_Board": "03 QUEENS", >"Borough": "QUEENS", >"X_Coordinate_State_Plane": 1017897, >"Y_Coordinate_State_Plane": 212354, >"Park_Facility_Name": "Unspecified", >"Park_Borough": "QUEENS", >"School_Name": "Unspecified", >"School_Number": "Unspecified", >"School_Region": "Unspecified", >"School_Code": "Unspecified", >"School_Phone_Number": "Unspecified", >"School_Address": "Unspecified", >"School_City": "Unspecified", >"School_State": "Unspecified", >"School_Zip": "Unspecified", >"School_Not_Found": "", >"School_or_Citywide_Complaint": "", >"Vehicle_Type": "", >"Taxi_Company_Borough": "", >"Taxi_Pick_Up_Location": "", >"Bridge_Highway_Name": "", >"Bridge_Highway_Direction": "", >"Road_Ramp": "", >"Bridge_Highway_Segment": "", >"Garage_Lot_Name": "", >"Ferry_Direction": "", >"Ferry_Terminal_Name": "", >"Latitude": 40.74947521870806, >"Longitude": -73.87856355000383, >"Location": "(40.74947521870806, -73.87856355000383)" > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-4240) DeprecatedCacheServerLauncherIntegrationTest fails sporadically with execution timeout
[ https://issues.apache.org/jira/browse/GEODE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786999#comment-16786999 ] Barry Oglesby commented on GEODE-4240: -- Reproduced in CI: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsIntegrationTestOpenJDK8/builds/323 org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest > testServerPortOneCacheServer FAILED java.lang.AssertionError: Timed out waiting for output "CacheServer pid: \\d+ status: running" after 12 ms. Output: Starting CacheServer with pid: 0 at org.junit.Assert.fail(Assert.java:88) at org.apache.geode.test.process.ProcessWrapper.waitForOutputToMatch(ProcessWrapper.java:240) at org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.execAndValidate(DeprecatedCacheServerLauncherIntegrationTest.java:438) at org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.testServerPortOneCacheServer(DeprecatedCacheServerLauncherIntegrationTest.java:334) 4564 tests completed, 1 failed, 95 skipped =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0022/test-results/integrationTest/1551928468/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0022/test-artifacts/1551928468/windows-integrationtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0022.tgz > DeprecatedCacheServerLauncherIntegrationTest fails sporadically with > execution timeout > -- > > Key: GEODE-4240 > URL: https://issues.apache.org/jira/browse/GEODE-4240 > Project: Geode > Issue Type: Bug >Reporter: Patrick Rhomberg >Assignee: Dan Smith >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > While possibly unrelated, it is worth noting other recent failures due to > startup timeouts. > ([GEODE-4236](https://issues.apache.org/jira/browse/GEODE-4236) comes to > mind.) > I have recently seen a failure in this test timing out with the following > stacktrace: > {noformat} > java.lang.AssertionError: Timed out waiting for output "CacheServer pid: \d+ > status: running" after 12 ms. Output: > Starting CacheServer with pid: 0 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.geode.test.process.ProcessWrapper.waitForOutputToMatch(ProcessWrapper.java:222) > at > org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.execAndValidate(DeprecatedCacheServerLauncherIntegrationTest.java:437) > at > org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.testStartStatusStop(DeprecatedCacheServerLauncherIntegrationTest.java:164) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at
[jira] [Resolved] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails
[ https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6435. -- Resolution: Fixed Fix Version/s: 1.9.0 > CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders > assertion fails > -- > > Key: GEODE-6435 > URL: https://issues.apache.org/jira/browse/GEODE-6435 > Project: Geode > Issue Type: Bug >Reporter: Patrick Rhomberg >Assignee: Barry Oglesby >Priority: Major > Fix For: 1.9.0 > > Attachments: jdk11_OldGen_usedMemory_failure.gif, > jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif > > Time Spent: 0.5h > Remaining Estimate: 0h > > {noformat} > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest > > testCreateMaximumSenders FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest$$Lambda$390/0x000840596840.run > in VM 2 running on Host 2e6d9f20266c with 8 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579) > at org.apache.geode.test.dunit.VM.invoke(VM.java:406) > at > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders(SerialGatewaySenderQueueDUnitTest.java:370) > Caused by: > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in org.apache.geode.internal.cache.wan.WANTestBase that uses long > was not fulfilled within 300 seconds. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:79) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:27) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:860) > at > org.apache.geode.internal.cache.wan.WANTestBase.verifyListenerEvents(WANTestBase.java:3542) > at > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.lambda$testCreateMaximumSenders$faf964a3$1(SerialGatewaySenderQueueDUnitTest.java:370) > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.rmi.ConnectIOException: error during JRMP connection > establishment; nested exception is: > java.net.SocketTimeoutException: Read timed out > Caused by: > java.net.SocketTimeoutException: Read timed out > {noformat} > > See pipeline failure here: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/413] > Find test results here: > [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-results/distributedTest/1550631451/] > Find artifacts here: > [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-artifacts/1550631451/distributedtestfiles-OpenJDK11-1.9.0-SNAPSHOT.0456.tgz] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails
[ https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774543#comment-16774543 ] Barry Oglesby edited comment on GEODE-6435 at 2/21/19 10:12 PM: I wrote a simple test that just creates 500 connections to a server. With no heap settings, I see pretty much the same heap usage as JDK 8. With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m -XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap usage (which causes the OOME). No heap settings: Live histogram: {noformat} num #instances #bytes class name (module) --- 1: 33610 268237040 [B (java.base@11.0.2) 2: 4441 1618264 [I (java.base@11.0.2) 3: 30274 726576 java.lang.String (java.base@11.0.2) 4: 5664 684952 java.lang.Class (java.base@11.0.2) 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2) 6: 618 368872 [J (java.base@11.0.2) 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2) 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2) Total 188802 276340528\{noformat} vsd: {noformat} currentMaxMemory=8,589,934,592 currentUsedMemory=279,838,728\{noformat} With heap settings: Live histogram: {noformat} num #instances #bytes class name (module) --- 1: 30977 266997720 [B (java.base@11.0.2) 2: 4934 262202008 [I (java.base@11.0.2) 3: 28018 672432 java.lang.String (java.base@11.0.2) 4: 5435 659304 java.lang.Class (java.base@11.0.2) 5: 614 367784 [J (java.base@11.0.2) 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2) 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2) 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2) Total 165373 534770560\{noformat} vsd: {noformat} currentMaxMemory=536,870,912 currentUsedMemory=535,659,832\{noformat} was (Author: barry.oglesby): I wrote a simple test that just creates 500 connections to a server. With no heap settings, I see pretty much the same heap usage as JDK 8. With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m -XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap usage (which causes the OOME). No heap settings: Live histogram: {noconfig} num #instances #bytes class name (module) --- 1: 33610 268237040 [B (java.base@11.0.2) 2: 44411618264 [I (java.base@11.0.2) 3: 30274 726576 java.lang.String (java.base@11.0.2) 4: 5664 684952 java.lang.Class (java.base@11.0.2) 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2) 6: 618 368872 [J (java.base@11.0.2) 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2) 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2) Total188802 276340528 {noconfig} vsd: {noconfig} currentMaxMemory=8,589,934,592 currentUsedMemory=279,838,728 {noconfig} With heap settings: Live histogram: {noconfig} num #instances #bytes class name (module) --- 1: 30977 266997720 [B (java.base@11.0.2) 2: 4934 262202008 [I (java.base@11.0.2) 3: 28018 672432 java.lang.String (java.base@11.0.2) 4: 5435 659304 java.lang.Class (java.base@11.0.2) 5: 614 367784 [J (java.base@11.0.2) 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2) 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2) 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2) Total165373 534770560 {noconfig} vsd: {noconfig} currentMaxMemory=536,870,912 currentUsedMemory=535,659,832 {noconfig} > CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders > assertion fails > -- > > Key: GEODE-6435 > URL: https://issues.apache.org/jira/browse/GEODE-6435 > Project: Geode > Issue Type:
[jira] [Comment Edited] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails
[ https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774543#comment-16774543 ] Barry Oglesby edited comment on GEODE-6435 at 2/21/19 10:13 PM: I wrote a simple test that just creates 500 connections to a server. With no heap settings, I see pretty much the same heap usage as JDK 8. With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m -XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap usage (which causes the OOME). No heap settings: Live histogram: {noformat} num #instances #bytes class name (module) --- 1: 33610 268237040 [B (java.base@11.0.2) 2: 4441 1618264 [I (java.base@11.0.2) 3: 30274 726576 java.lang.String (java.base@11.0.2) 4: 5664 684952 java.lang.Class (java.base@11.0.2) 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2) 6: 618 368872 [J (java.base@11.0.2) 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2) 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2) Total 188802 276340528 {noformat} vsd: {noformat} currentMaxMemory=8,589,934,592 currentUsedMemory=279,838,728 {noformat} With heap settings: Live histogram: {noformat} num #instances #bytes class name (module) --- 1: 30977 266997720 [B (java.base@11.0.2) 2: 4934 262202008 [I (java.base@11.0.2) 3: 28018 672432 java.lang.String (java.base@11.0.2) 4: 5435 659304 java.lang.Class (java.base@11.0.2) 5: 614 367784 [J (java.base@11.0.2) 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2) 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2) 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2) Total 165373 534770560 {noformat} vsd: {noformat} currentMaxMemory=536,870,912 currentUsedMemory=535,659,832 {noformat} was (Author: barry.oglesby): I wrote a simple test that just creates 500 connections to a server. With no heap settings, I see pretty much the same heap usage as JDK 8. With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m -XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap usage (which causes the OOME). No heap settings: Live histogram: {noformat} num #instances #bytes class name (module) --- 1: 33610 268237040 [B (java.base@11.0.2) 2: 4441 1618264 [I (java.base@11.0.2) 3: 30274 726576 java.lang.String (java.base@11.0.2) 4: 5664 684952 java.lang.Class (java.base@11.0.2) 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2) 6: 618 368872 [J (java.base@11.0.2) 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2) 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2) Total 188802 276340528\{noformat} vsd: {noformat} currentMaxMemory=8,589,934,592 currentUsedMemory=279,838,728\{noformat} With heap settings: Live histogram: {noformat} num #instances #bytes class name (module) --- 1: 30977 266997720 [B (java.base@11.0.2) 2: 4934 262202008 [I (java.base@11.0.2) 3: 28018 672432 java.lang.String (java.base@11.0.2) 4: 5435 659304 java.lang.Class (java.base@11.0.2) 5: 614 367784 [J (java.base@11.0.2) 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2) 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2) 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2) Total 165373 534770560\{noformat} vsd: {noformat} currentMaxMemory=536,870,912 currentUsedMemory=535,659,832\{noformat} > CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders > assertion fails > -- > > Key: GEODE-6435 > URL: https://issues.apache.org/jira/browse/GEODE-6435 > Project: Geode > Issue Type: Bug >Reporter: Patrick Rhomberg >Assignee: Barry Oglesby >Priority: Major > Attachments: jdk11_OldGen_usedMemory_failure.gif, > jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif > > > {noformat} > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest > > testCreateMaximumSenders FAILED > org.apache.geode.test.dunit.RMIException:
[jira] [Commented] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails
[ https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774543#comment-16774543 ] Barry Oglesby commented on GEODE-6435: -- I wrote a simple test that just creates 500 connections to a server. With no heap settings, I see pretty much the same heap usage as JDK 8. With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m -XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap usage (which causes the OOME). No heap settings: Live histogram: {noconfig} num #instances #bytes class name (module) --- 1: 33610 268237040 [B (java.base@11.0.2) 2: 44411618264 [I (java.base@11.0.2) 3: 30274 726576 java.lang.String (java.base@11.0.2) 4: 5664 684952 java.lang.Class (java.base@11.0.2) 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2) 6: 618 368872 [J (java.base@11.0.2) 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2) 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2) Total188802 276340528 {noconfig} vsd: {noconfig} currentMaxMemory=8,589,934,592 currentUsedMemory=279,838,728 {noconfig} With heap settings: Live histogram: {noconfig} num #instances #bytes class name (module) --- 1: 30977 266997720 [B (java.base@11.0.2) 2: 4934 262202008 [I (java.base@11.0.2) 3: 28018 672432 java.lang.String (java.base@11.0.2) 4: 5435 659304 java.lang.Class (java.base@11.0.2) 5: 614 367784 [J (java.base@11.0.2) 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2) 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2) 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2) 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2) 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2) Total165373 534770560 {noconfig} vsd: {noconfig} currentMaxMemory=536,870,912 currentUsedMemory=535,659,832 {noconfig} > CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders > assertion fails > -- > > Key: GEODE-6435 > URL: https://issues.apache.org/jira/browse/GEODE-6435 > Project: Geode > Issue Type: Bug >Reporter: Patrick Rhomberg >Assignee: Barry Oglesby >Priority: Major > Attachments: jdk11_OldGen_usedMemory_failure.gif, > jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif > > > {noformat} > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest > > testCreateMaximumSenders FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest$$Lambda$390/0x000840596840.run > in VM 2 running on Host 2e6d9f20266c with 8 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579) > at org.apache.geode.test.dunit.VM.invoke(VM.java:406) > at > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders(SerialGatewaySenderQueueDUnitTest.java:370) > Caused by: > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in org.apache.geode.internal.cache.wan.WANTestBase that uses long > was not fulfilled within 300 seconds. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:79) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:27) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:860) > at > org.apache.geode.internal.cache.wan.WANTestBase.verifyListenerEvents(WANTestBase.java:3542) > at > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.lambda$testCreateMaximumSenders$faf964a3$1(SerialGatewaySenderQueueDUnitTest.java:370) > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.rmi.ConnectIOException: error
[jira] [Commented] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails
[ https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773584#comment-16773584 ] Barry Oglesby commented on GEODE-6435: -- Several threads in vm4 are throwing OutOfMemoryErrors in this test. {noformat} [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] Dumping heap to java_pid367.hprof ... [vm4] Heap dump file created [558837052 bytes in 1.626 secs] [vm4] [fatal 2019/02/20 01:51:58.219 UTC tid=0x8ba] Fatal error from asynchronous flusher thread [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] [error 2019/02/20 01:51:58.232 UTC tid=0x6ac] JGRP000190: failed receiving packet [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] [fatal 2019/02/20 01:51:59.303 UTC tid=0x87a] Fatal error from asynchronous flusher thread [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] [fatal 2019/02/20 01:51:59.303 UTC tid=0x760] Fatal error from asynchronous flusher thread [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] [fatal 2019/02/20 01:51:59.303 UTC tid=0x882] Fatal error from asynchronous flusher thread [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] [fatal 2019/02/20 01:51:59.304 UTC tid=0xa23] Fatal error from asynchronous flusher thread [vm4] java.lang.OutOfMemoryError: Java heap space {noformat} When I run the test on my local machine, it is successful. The difference is that my test runs in ~7 seconds. The failed test took ~10 seconds. This is enough time for the PingTask to run. This creates an additional connection (so it doubles the connections). A stack trace shows the PingTask creating a Connection: {noformat} [vm4] java.lang.Exception: Stack trace [vm4] at java.base/java.lang.Thread.dumpStack(Thread.java:1387) [vm4] at org.apache.geode.cache.client.internal.ConnectionImpl.connect(ConnectionImpl.java:112) [vm4] at org.apache.geode.cache.client.internal.ConnectionConnector.connectClientToServer(ConnectionConnector.java:75) [vm4] at org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:111) [vm4] at org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:321) [vm4] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:395) [vm4] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:362) [vm4] at org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:835) [vm4] at org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:36) [vm4] at org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90) [vm4] at org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1371) [vm4] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [vm4] at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [vm4] at org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:276) [vm4] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [vm4] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [vm4] at java.base/java.lang.Thread.run(Thread.java:834) {noformat} In the failed test, there are log messages from the poolTimer threads like below. These are coming from the PingTask threads. {noformat} [vm4] [info 2019/02/20 01:51:02.547 UTC tid=0xaa0] Socket receive buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.547 UTC tid=0xaa0] Socket send buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.607 UTC tid=0xaa1] Socket receive buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.608 UTC tid=0xaa1] Socket send buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.657 UTC tid=0xaa2] Socket receive buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.657 UTC tid=0xaa2] Socket send buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.727 UTC tid=0xaa4] Socket receive buffer size is 212992 instead of the requested 524288. [vm4] [info 2019/02/20 01:51:02.727 UTC tid=0xaa4] Socket send buffer size is 212992 instead of the requested 524288. ... {noformat} If I add a sleep in the test to mimic the CI test run, my test throws the same OutOfMemoryErrors. This test does not behave the same way in JDK 8. It uses much less memory. I attached a few charts showing the memory usage in both JDK 8 and JDK 11 with successful and failed tests. > CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders > assertion fails >
[jira] [Updated] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails
[ https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6435: - Attachment: jdk11_OldGen_usedMemory_success.gif jdk11_OldGen_usedMemory_failure.gif jdk8_OldGen_usedMemory_success.gif > CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders > assertion fails > -- > > Key: GEODE-6435 > URL: https://issues.apache.org/jira/browse/GEODE-6435 > Project: Geode > Issue Type: Bug >Reporter: Patrick Rhomberg >Assignee: Barry Oglesby >Priority: Major > Attachments: jdk11_OldGen_usedMemory_failure.gif, > jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif > > > {noformat} > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest > > testCreateMaximumSenders FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest$$Lambda$390/0x000840596840.run > in VM 2 running on Host 2e6d9f20266c with 8 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579) > at org.apache.geode.test.dunit.VM.invoke(VM.java:406) > at > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders(SerialGatewaySenderQueueDUnitTest.java:370) > Caused by: > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in org.apache.geode.internal.cache.wan.WANTestBase that uses long > was not fulfilled within 300 seconds. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:79) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:27) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:860) > at > org.apache.geode.internal.cache.wan.WANTestBase.verifyListenerEvents(WANTestBase.java:3542) > at > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.lambda$testCreateMaximumSenders$faf964a3$1(SerialGatewaySenderQueueDUnitTest.java:370) > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.rmi.ConnectIOException: error during JRMP connection > establishment; nested exception is: > java.net.SocketTimeoutException: Read timed out > Caused by: > java.net.SocketTimeoutException: Read timed out > {noformat} > > See pipeline failure here: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/413] > Find test results here: > [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-results/distributedTest/1550631451/] > Find artifacts here: > [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-artifacts/1550631451/distributedtestfiles-OpenJDK11-1.9.0-SNAPSHOT.0456.tgz] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6356) CI failure: PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer fails with suspect string
[ https://issues.apache.org/jira/browse/GEODE-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760238#comment-16760238 ] Barry Oglesby commented on GEODE-6356: -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0401/test-results/distributedTest/1549307890/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0401/test-artifacts/1549307890/distributedtestfiles-OpenJDK8-1.9.0-SNAPSHOT.0401.tgz > CI failure: > PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer > fails with suspect string > > > Key: GEODE-6356 > URL: https://issues.apache.org/jira/browse/GEODE-6356 > Project: Geode > Issue Type: Bug > Components: persistence, tests >Reporter: Barry Oglesby >Priority: Major > > PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer > failed in DistributedTestOpenJDK8 CI run > [361|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/361] > with this suspect string: > {noformat} > org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest > > doesNotWaitForPreviousInstanceOfOnlineServer FAILED > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in log4j at line 2052 > [error 2019/02/04 18:49:32.561 UTC > tid=32] org.apache.geode.distributed.DistributedSystemDisconnectedException: > This connection to a distributed system has been disconnected. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6356) CI failure: PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer fails with suspect string
Barry Oglesby created GEODE-6356: Summary: CI failure: PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer fails with suspect string Key: GEODE-6356 URL: https://issues.apache.org/jira/browse/GEODE-6356 Project: Geode Issue Type: Bug Components: persistence, tests Reporter: Barry Oglesby PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer failed in DistributedTestOpenJDK8 CI run [361|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/361] with this suspect string: {noformat} org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest > doesNotWaitForPreviousInstanceOfOnlineServer FAILED java.lang.AssertionError: Suspicious strings were written to the log during this run. Fix the strings or use IgnoredException.addIgnoredException to ignore. --- Found suspect string in log4j at line 2052 [error 2019/02/04 18:49:32.561 UTC tid=32] org.apache.geode.distributed.DistributedSystemDisconnectedException: This connection to a distributed system has been disconnected. {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6287. -- Resolution: Fixed Fix Version/s: 1.9.0 > When a client connects, registers interest and disconnects normally, its > ClientProxyMembershipID is not cleaned up and a memory leak occurs > --- > > Key: GEODE-6287 > URL: https://issues.apache.org/jira/browse/GEODE-6287 > Project: Geode > Issue Type: Bug > Components: client queues, client/server >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a client connects to a distributed system and registers interest, the > Region's FilterProfile's clientMap (an IDMap) registers the > ClientProxyMembershipID in both the realIDs and wireIDs like: > {noformat} > realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; > wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} > {noformat} > When the client leaves normally, the UnregisterInterest command is invoked > which removes the interest and the region. Part of that behavior is to remove > the regionName from the set of regions. > {noformat} > this.regions.remove(regionName) > {noformat} > Then ClientInterestList.clearClientInterestList is then invoked which is > supposed to clear the FilterProfile for each region, but the regions are > already cleared by the UnregisterInterest command, so this method doesn't do > anything. > Then, LocalRegion.cleanupForClient is invoked which invokes > FilterProfile.cleanupForClient. This method currently only closes CQs (which > also cleans up the cqMap which is also an IDMap like the clientMap). > At the end of this, the clientMap's realIDs and wireIDs still contain the > ClientProxyMembershipID. > The cleanupForClient method could be changed to also clean up the clientMap. > Note: If the client is killed abnormally, the UnregisterInterest command is > not invoked, so the interest and the region is not cleaned up normally. When > ClientInterestList.clearClientInterestList is called, the set of regions > still contains the region, and the IDMap is cleaned up properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6327) There needs to be a way to specify identity fields for JSON documents converted to PDX
[ https://issues.apache.org/jira/browse/GEODE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754280#comment-16754280 ] Barry Oglesby commented on GEODE-6327: -- The attached implementation was done for a POC. In this case, it reduced the number of fields used in hashCode and equals from 98 to 1 and reduced the query time from 30 ms to 12 ms. > There needs to be a way to specify identity fields for JSON documents > converted to PDX > -- > > Key: GEODE-6327 > URL: https://issues.apache.org/jira/browse/GEODE-6327 > Project: Geode > Issue Type: New Feature > Components: serialization >Reporter: Barry Oglesby >Priority: Major > Attachments: geode-6327-poc.patch > > > In the current implementation, there is no way to prevent all fields from > being used when executing hashCode and equals. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GEODE-6327) There needs to be a way to specify identity fields for JSON documents converted to PDX
[ https://issues.apache.org/jira/browse/GEODE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6327: - Attachment: geode-6327-poc.patch > There needs to be a way to specify identity fields for JSON documents > converted to PDX > -- > > Key: GEODE-6327 > URL: https://issues.apache.org/jira/browse/GEODE-6327 > Project: Geode > Issue Type: New Feature > Components: serialization >Reporter: Barry Oglesby >Priority: Major > Attachments: geode-6327-poc.patch > > > In the current implementation, there is no way to prevent all fields from > being used when executing hashCode and equals. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6327) There needs to be a way to specify identity fields for JSON documents converted to PDX
Barry Oglesby created GEODE-6327: Summary: There needs to be a way to specify identity fields for JSON documents converted to PDX Key: GEODE-6327 URL: https://issues.apache.org/jira/browse/GEODE-6327 Project: Geode Issue Type: New Feature Components: serialization Reporter: Barry Oglesby In the current implementation, there is no way to prevent all fields from being used when executing hashCode and equals. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6267. -- Resolution: Fixed Fix Version/s: 1.9.0 > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at > org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) > at > org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) > at > org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) > at > org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) > at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) > at > org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) > at > org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) > at > org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256) > at > org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139) > at > org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044) > at > org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797) > at > org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641) > at java.lang.Thread.run(Thread.java:745) > {noformat} > When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), > but the Subjects are not logged out. > With subscription-enabled=true, an additional Subject is created and stored > in the CacheClientProxy subject. This Subject is not logged out either. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:43.023 PST server1 Thread 0> tid=0x52] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at >
[jira] [Created] (GEODE-6293) Gfsh execute function command expects the function to have a result
Barry Oglesby created GEODE-6293: Summary: Gfsh execute function command expects the function to have a result Key: GEODE-6293 URL: https://issues.apache.org/jira/browse/GEODE-6293 Project: Geode Issue Type: Bug Components: gfsh Reporter: Barry Oglesby Functions with hasResult returning false cause gfsh to log this exception message: {noformat} gfsh>execute function --id=TestNoResultFunction --region=/data Member | Status | Message | -- | server-1 | ERROR | Exception: Cannot return any result as the Function#hasResult() is false {noformat} That message is coming from `UserFunctionExecution.execute` which does: {noformat} List results = (List) execution.execute(function.getId()).getResult(); {noformat} Here is the stack where that happens: {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.internal.cache.execute.NoResult.getResult(NoResult.java:56) at org.apache.geode.management.internal.cli.functions.UserFunctionExecution.execute(UserFunctionExecution.java:156) at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:367) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:433) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956) at org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:810) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at java.lang.Thread.run(Thread.java:745) {noformat} Here is a potential fix that addresses the issue: {noformat} List results = null; ResultCollector rc = execution.execute(function.getId()); if (function.hasResult()) { results = (List) rc.getResult(); } {noformat} This fix causes gfsh to report an OK result: {noformat} gfsh>execute function --id=TestNoResultFunction --region=/data Member | Status | Message | -- | --- server-1 | OK | [] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745357#comment-16745357 ] Barry Oglesby commented on GEODE-6287: -- I made a change to FilterProfile.cleanupForClient to also clean up the clientMap. That addressed the leak. Now with 15000 client connects/registerInterests/disconnects: {noformat} num #instances #bytes class name -- 1: 29928 2706856 [C 2: 7472 840576 java.lang.Class 3: 29822 715728 java.lang.String 4: 1607 478240 [B 5: 5946 445008 [Ljava.lang.Object; 6: 773 410136 [J 7: 11820 378240 java.util.concurrent.ConcurrentHashMap$Node 8: 2980 262240 java.lang.reflect.Method 9: 7953 254496 java.util.HashMap$Node 10: 4911 208984 [I 11: 2138 198408 [Ljava.util.HashMap$Node; 12: 10827 173232 java.lang.Object 13: 2119 169520 java.lang.reflect.Constructor 14: 4120 164800 java.util.LinkedHashMap$Entry 15: 1726 96656 java.util.LinkedHashMap Total 175924 9348120 {noformat} > When a client connects, registers interest and disconnects normally, its > ClientProxyMembershipID is not cleaned up and a memory leak occurs > --- > > Key: GEODE-6287 > URL: https://issues.apache.org/jira/browse/GEODE-6287 > Project: Geode > Issue Type: Bug > Components: client queues, client/server >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client connects to a distributed system and registers interest, the > Region's FilterProfile's clientMap (an IDMap) registers the > ClientProxyMembershipID in both the realIDs and wireIDs like: > {noformat} > realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; > wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} > {noformat} > When the client leaves normally, the UnregisterInterest command is invoked > which removes the interest and the region. Part of that behavior is to remove > the regionName from the set of regions. > {noformat} > this.regions.remove(regionName) > {noformat} > Then ClientInterestList.clearClientInterestList is then invoked which is > supposed to clear the FilterProfile for each region, but the regions are > already cleared by the UnregisterInterest command, so this method doesn't do > anything. > Then, LocalRegion.cleanupForClient is invoked which invokes > FilterProfile.cleanupForClient. This method currently only closes CQs (which > also cleans up the cqMap which is also an IDMap like the clientMap). > At the end of this, the clientMap's realIDs and wireIDs still contain the > ClientProxyMembershipID. > The cleanupForClient method could be changed to also clean up the clientMap. > Note: If the client is killed abnormally, the UnregisterInterest command is > not invoked, so the interest and the region is not cleaned up normally. When > ClientInterestList.clearClientInterestList is called, the set of regions > still contains the region, and the IDMap is cleaned up properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745352#comment-16745352 ] Barry Oglesby commented on GEODE-6287: -- Startup: {noformat} num #instances #bytes class name -- 1: 28887 2606096 [C 2: 7205 810664 java.lang.Class 3: 28820 691680 java.lang.String 4: 2004 494408 [B 5: 5153 453464 java.lang.reflect.Method 6: 5538 425016 [Ljava.lang.Object; 7: 657 389224 [J 8: 11600 371200 java.util.concurrent.ConcurrentHashMap$Node 9: 8124 324960 java.util.LinkedHashMap$Entry 10: 3047 277584 [Ljava.util.HashMap$Node; 11: 8536 273152 java.util.HashMap$Node 12: 2533 202640 java.lang.reflect.Constructor 13: 4764 198616 [I 14: 10575 169200 java.lang.Object 15: 2749 153944 java.util.LinkedHashMap Total 181128 9653224 {noformat} With 15000 client connects/registerInterests/disconnects: {noformat} num #instances #bytes class name -- 1: 104958 9068952 [C 2: 104852 2516448 java.lang.String 3: 16604 2160248 [B 4: 41802 1337664 java.util.concurrent.ConcurrentHashMap$Node 5: 15005 1080360 org.apache.geode.distributed.internal.membership.gms.GMSMember 6: 7473 840680 java.lang.Class 7: 15005 720240 org.apache.geode.distributed.internal.membership.InternalDistributedMember 8: 15031 480992 java.net.InetAddress$InetAddressHolder 9: 14996 479872 org.apache.geode.distributed.DurableClientAttributes 10: 14995 479840 org.apache.geode.internal.cache.tier.sockets.ClientProxyMembershipID 11: 5949 445136 [Ljava.lang.Object; 12: 25820 413120 java.lang.Object 13: 756 407840 [J 14: 15225 365400 java.lang.Long 15: 15026 360624 java.net.Inet4Address Total 501811 24712960 {noformat} > When a client connects, registers interest and disconnects normally, its > ClientProxyMembershipID is not cleaned up and a memory leak occurs > --- > > Key: GEODE-6287 > URL: https://issues.apache.org/jira/browse/GEODE-6287 > Project: Geode > Issue Type: Bug > Components: client queues, client/server >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client connects to a distributed system and registers interest, the > Region's FilterProfile's clientMap (an IDMap) registers the > ClientProxyMembershipID in both the realIDs and wireIDs like: > {noformat} > realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; > wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} > {noformat} > When the client leaves normally, the UnregisterInterest command is invoked > which removes the interest and the region. Part of that behavior is to remove > the regionName from the set of regions. > {noformat} > this.regions.remove(regionName) > {noformat} > Then ClientInterestList.clearClientInterestList is then invoked which is > supposed to clear the FilterProfile for each region, but the regions are > already cleared by the UnregisterInterest command, so this method doesn't do > anything. > Then, LocalRegion.cleanupForClient is invoked which invokes > FilterProfile.cleanupForClient. This method currently only closes CQs (which > also cleans up the cqMap which is also an IDMap like the clientMap). > At the end of this, the clientMap's realIDs and wireIDs still contain the > ClientProxyMembershipID. > The cleanupForClient method could be changed to also clean up the clientMap. > Note: If the client is killed abnormally, the UnregisterInterest command is > not invoked, so the interest and the region is not cleaned up normally. When > ClientInterestList.clearClientInterestList is called, the set of regions > still contains the region, and the IDMap is cleaned up properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6287: - Description: When a client connects to a distributed system and registers interest, the Region's FilterProfile's clientMap (an IDMap) registers the ClientProxyMembershipID in both the realIDs and wireIDs like: {noformat} realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} {noformat} When the client leaves normally, the UnregisterInterest command is invoked which removes the interest and the region. Part of that behavior is to remove the regionName from the set of regions. {noformat} this.regions.remove(regionName) {noformat} Then ClientInterestList.clearClientInterestList is then invoked which is supposed to clear the FilterProfile for each region, but the regions are already cleared by the UnregisterInterest command, so this method doesn't do anything. Then, LocalRegion.cleanupForClient is invoked which invokes FilterProfile.cleanupForClient. This method currently only closes CQs (which also cleans up the cqMap which is also an IDMap like the clientMap). At the end of this, the clientMap's realIDs and wireIDs still contain the ClientProxyMembershipID. The cleanupForClient method could be changed to also clean up the clientMap. Note: If the client is killed abnormally, the UnregisterInterest command is not invoked, so the interest and the region is not cleaned up normally. When ClientInterestList.clearClientInterestList is called, the set of regions still contains the region, and the IDMap is cleaned up properly. was: When a client connects to a distributed system and registers interest, the Region's FilterProfile's clientMap (an IDMap) registers the ClientProxyMembershipID in both the realIDs and wireIDs like: ``` realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} ``` When the client leaves normally, the UnregisterInterest command is invoked which removes the interest and the region. Part of that behavior is to remove the regionName from the set of regions. ``` this.regions.remove(regionName) ``` Then ClientInterestList.clearClientInterestList is then invoked which is supposed to clear the FilterProfile for each region, but the regions are already cleared by the UnregisterInterest command, so this method doesn't do anything. Then, LocalRegion.cleanupForClient is invoked which invokes FilterProfile.cleanupForClient. This method currently only closes CQs (which also cleans up the cqMap which is also an IDMap like the clientMap). At the end of this, the clientMap's realIDs and wireIDs still contain the ClientProxyMembershipID. The cleanupForClient method could be changed to also clean up the clientMap. Note: If the client is killed abnormally, the UnregisterInterest command is not invoked, so the interest and the region is not cleaned up normally. When ClientInterestList.clearClientInterestList is called, the set of regions still contains the region, and the IDMap is cleaned up properly. > When a client connects, registers interest and disconnects normally, its > ClientProxyMembershipID is not cleaned up and a memory leak occurs > --- > > Key: GEODE-6287 > URL: https://issues.apache.org/jira/browse/GEODE-6287 > Project: Geode > Issue Type: Bug > Components: client queues, client/server >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client connects to a distributed system and registers interest, the > Region's FilterProfile's clientMap (an IDMap) registers the > ClientProxyMembershipID in both the realIDs and wireIDs like: > {noformat} > realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; > wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} > {noformat} > When the client leaves normally, the UnregisterInterest command is invoked > which removes the interest and the region. Part of that behavior is to remove > the regionName from the set of regions. > {noformat} > this.regions.remove(regionName) > {noformat} > Then ClientInterestList.clearClientInterestList is then invoked which is > supposed to clear the FilterProfile for each region, but the regions are > already cleared by the UnregisterInterest command, so this method doesn't do > anything. > Then,
[jira] [Commented] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745349#comment-16745349 ] Barry Oglesby commented on GEODE-6287: -- When a client connects to a distributed system and registers interest, the Region's FilterProfile's clientMap (an IDMap) registers the ClientProxyMembershipID in both the realIDs and wireIDs like: {noformat} realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} {noformat} When the client leaves normally, the UnregisterInterest command is invoked which removes the interest and the region. Part of that behavior is to remove the regionName from the set of regions. {noformat} this.regions.remove(regionName) {noformat} Then ClientInterestList.clearClientInterestList is then invoked which is supposed to clear the FilterProfile for each region, but the regions are already cleared by the UnregisterInterest command, so this method doesn't do anything. Then, LocalRegion.cleanupForClient is invoked which invokes FilterProfile.cleanupForClient. This method currently only closes CQs (which also cleans up the cqMap which is also an IDMap like the clientMap). At the end of this, the clientMap's realIDs and wireIDs still contain the ClientProxyMembershipID. The cleanupForClient method could be changed to also clean up the clientMap. Note: If the client is killed abnormally, the UnregisterInterest command is not invoked, so the interest and the region is not cleaned up normally. When ClientInterestList.clearClientInterestList is called, the set of regions still contains the region, and the IDMap is cleaned up properly. Here is a stack trace showing the ClientProxyMembershipID being registered: {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.internal.cache.FilterProfile$IDMap.getWireID(FilterProfile.java:2032) at org.apache.geode.internal.cache.FilterProfile.getClientIDForMaps(FilterProfile.java:1615) at org.apache.geode.internal.cache.FilterProfile.registerClientInterest(FilterProfile.java:261) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy$ClientInterestList.registerClientInterest(CacheClientProxy.java:2052) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.registerClientInterest(CacheClientProxy.java:1270) at org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerClientInterest(CacheClientNotifier.java:1194) at org.apache.geode.internal.cache.tier.sockets.command.RegisterInterest61.cmdExecute(RegisterInterest61.java:200) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1218) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:613) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at java.lang.Thread.run(Thread.java:745) {noformat} Here is a stack trace showing the UnregisterInterest command unregistering the client interest: {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy$ClientInterestList.unregisterClientInterest(CacheClientProxy.java:2085) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.unregisterClientInterest(CacheClientProxy.java:1330) at org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.unregisterClientInterest(CacheClientNotifier.java:1245) at org.apache.geode.internal.cache.tier.sockets.command.UnregisterInterest.cmdExecute(UnregisterInterest.java:144) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:75) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
[jira] [Updated] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6287: - Description: When a client connects to a distributed system and registers interest, the Region's FilterProfile's clientMap (an IDMap) registers the ClientProxyMembershipID in both the realIDs and wireIDs like: {noformat} realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} {noformat} When the client leaves normally, the UnregisterInterest command is invoked which removes the interest and the region. Part of that behavior is to remove the regionName from the set of regions. {noformat} this.regions.remove(regionName) {noformat} Then ClientInterestList.clearClientInterestList is then invoked which is supposed to clear the FilterProfile for each region, but the regions are already cleared by the UnregisterInterest command, so this method doesn't do anything. Then, LocalRegion.cleanupForClient is invoked which invokes FilterProfile.cleanupForClient. This method currently only closes CQs (which also cleans up the cqMap which is also an IDMap like the clientMap). At the end of this, the clientMap's realIDs and wireIDs still contain the ClientProxyMembershipID. The cleanupForClient method could be changed to also clean up the clientMap. Note: If the client is killed abnormally, the UnregisterInterest command is not invoked, so the interest and the region is not cleaned up normally. When ClientInterestList.clearClientInterestList is called, the set of regions still contains the region, and the IDMap is cleaned up properly. was: When a client connects to a distributed system and registers interest, the Region's FilterProfile's clientMap (an IDMap) registers the ClientProxyMembershipID in both the realIDs and wireIDs like: {noformat} realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} {noformat} When the client leaves normally, the UnregisterInterest command is invoked which removes the interest and the region. Part of that behavior is to remove the regionName from the set of regions. {noformat} this.regions.remove(regionName) {noformat} Then ClientInterestList.clearClientInterestList is then invoked which is supposed to clear the FilterProfile for each region, but the regions are already cleared by the UnregisterInterest command, so this method doesn't do anything. Then, LocalRegion.cleanupForClient is invoked which invokes FilterProfile.cleanupForClient. This method currently only closes CQs (which also cleans up the cqMap which is also an IDMap like the clientMap). At the end of this, the clientMap's realIDs and wireIDs still contain the ClientProxyMembershipID. The cleanupForClient method could be changed to also clean up the clientMap. Note: If the client is killed abnormally, the UnregisterInterest command is not invoked, so the interest and the region is not cleaned up normally. When ClientInterestList.clearClientInterestList is called, the set of regions still contains the region, and the IDMap is cleaned up properly. > When a client connects, registers interest and disconnects normally, its > ClientProxyMembershipID is not cleaned up and a memory leak occurs > --- > > Key: GEODE-6287 > URL: https://issues.apache.org/jira/browse/GEODE-6287 > Project: Geode > Issue Type: Bug > Components: client queues, client/server >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client connects to a distributed system and registers interest, the > Region's FilterProfile's clientMap (an IDMap) registers the > ClientProxyMembershipID in both the realIDs and wireIDs like: > {noformat} > realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; > wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} > {noformat} > When the client leaves normally, the UnregisterInterest command is invoked > which removes the interest and the region. Part of that behavior is to remove > the regionName from the set of regions. > {noformat} > this.regions.remove(regionName) > {noformat} > Then ClientInterestList.clearClientInterestList is then invoked which is > supposed to clear the FilterProfile for each region, but the regions are > already cleared by the UnregisterInterest command, so this method doesn't do > anything.
[jira] [Created] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
Barry Oglesby created GEODE-6287: Summary: When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs Key: GEODE-6287 URL: https://issues.apache.org/jira/browse/GEODE-6287 Project: Geode Issue Type: Bug Components: client queues, client/server Reporter: Barry Oglesby When a client connects to a distributed system and registers interest, the Region's FilterProfile's clientMap (an IDMap) registers the ClientProxyMembershipID in both the realIDs and wireIDs like: ``` realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} ``` When the client leaves normally, the UnregisterInterest command is invoked which removes the interest and the region. Part of that behavior is to remove the regionName from the set of regions. ``` this.regions.remove(regionName) ``` Then ClientInterestList.clearClientInterestList is then invoked which is supposed to clear the FilterProfile for each region, but the regions are already cleared by the UnregisterInterest command, so this method doesn't do anything. Then, LocalRegion.cleanupForClient is invoked which invokes FilterProfile.cleanupForClient. This method currently only closes CQs (which also cleans up the cqMap which is also an IDMap like the clientMap). At the end of this, the clientMap's realIDs and wireIDs still contain the ClientProxyMembershipID. The cleanupForClient method could be changed to also clean up the clientMap. Note: If the client is killed abnormally, the UnregisterInterest command is not invoked, so the interest and the region is not cleaned up normally. When ClientInterestList.clearClientInterestList is called, the set of regions still contains the region, and the IDMap is cleaned up properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs
[ https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6287: Assignee: Barry Oglesby > When a client connects, registers interest and disconnects normally, its > ClientProxyMembershipID is not cleaned up and a memory leak occurs > --- > > Key: GEODE-6287 > URL: https://issues.apache.org/jira/browse/GEODE-6287 > Project: Geode > Issue Type: Bug > Components: client queues, client/server >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client connects to a distributed system and registers interest, the > Region's FilterProfile's clientMap (an IDMap) registers the > ClientProxyMembershipID in both the realIDs and wireIDs like: > ``` > realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1}; > wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2} > ``` > When the client leaves normally, the UnregisterInterest command is invoked > which removes the interest and the region. Part of that behavior is to remove > the regionName from the set of regions. > ``` > this.regions.remove(regionName) > ``` > Then ClientInterestList.clearClientInterestList is then invoked which is > supposed to clear the FilterProfile for each region, but the regions are > already cleared by the UnregisterInterest command, so this method doesn't do > anything. > Then, LocalRegion.cleanupForClient is invoked which invokes > FilterProfile.cleanupForClient. This method currently only closes CQs (which > also cleans up the cqMap which is also an IDMap like the clientMap). > At the end of this, the clientMap's realIDs and wireIDs still contain the > ClientProxyMembershipID. > The cleanupForClient method could be changed to also clean up the clientMap. > Note: If the client is killed abnormally, the UnregisterInterest command is > not invoked, so the interest and the region is not cleaned up normally. When > ClientInterestList.clearClientInterestList is called, the set of regions > still contains the region, and the IDMap is cleaned up properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-670) The size of the GatewaySenderEvent is sometimes calculated by serializing its value rather than using the Sizeable interface
[ https://issues.apache.org/jira/browse/GEODE-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-670. - Resolution: Fixed > The size of the GatewaySenderEvent is sometimes calculated by serializing its > value rather than using the Sizeable interface > > > Key: GEODE-670 > URL: https://issues.apache.org/jira/browse/GEODE-670 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > {{BucketRegion calcMemSize}} special-cases {{GatewaySenderEventImpl}} to get > just its value. In most cases, the value is a byte[], so the size is just the > length of the byte[]. If the {{GatewayEventSubstitutionFilter}} is used, then > the event's value is null and its valueObject is a java object. In this case, > the valueObject is serialized and returned. {{BucketRegion calcMemSize}} then > just returns the length of that byte{} using {{CachedDeserializableFactory > calcMemSize}}. > {{GatewaySenderEventImpl}} shouldn't be special-cased. It can be sized using > {{CachedDeserializableFactory calcMemSize}} just like other values. This will > invoke {{GatewaySenderEventImpl getSizeInBytes}} which does the right thing > for the valueObject by invoking {{CachedDeserializableFactory calcMemSize}} > on it. This method uses the {{Sizeable}} interface if appropriate. The > resulting size will be a bit bigger but more accurate than what is currently > reported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-6246) An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy during GatewaySender queue initialization
[ https://issues.apache.org/jira/browse/GEODE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6246. -- Resolution: Fixed > An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy > during GatewaySender queue initialization > --- > > Key: GEODE-6246 > URL: https://issues.apache.org/jira/browse/GEODE-6246 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > An EntryNotFoundException like below can be thrown by > BucketRegionQueue.basicDestroy during GatewaySender queue initialization: > {noformat} > [warn 2019/01/03 15:53:00.423 PST > tid=0x56] Task failed with exception > org.apache.geode.cache.EntryNotFoundException: 57546 > at > org.apache.geode.internal.cache.BucketRegionQueue.basicDestroy(BucketRegionQueue.java:368) > at > org.apache.geode.internal.cache.BucketRegionQueue.destroyKey(BucketRegionQueue.java:564) > at > org.apache.geode.internal.cache.BucketRegionQueue.destroyFailedBatchRemovalMessageKeys(BucketRegionQueue.java:181) > at > org.apache.geode.internal.cache.BucketRegionQueue.initializeEventSeqNumQueue(BucketRegionQueue.java:151) > at > org.apache.geode.internal.cache.BucketRegionQueue.cleanUpDestroyedTokensAndMarkGIIComplete(BucketRegionQueue.java:89) > at > org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1220) > at > org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1071) > at > org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:256) > at > org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:1012) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:776) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:451) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:310) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2881) > at > org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1122) > at > org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:511) > at > org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54) > at > org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100) > at > org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:846) > at > org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743391#comment-16743391 ] Barry Oglesby commented on GEODE-6267: -- I found something else while debugging this leak. If the ClentHealthMonitor unregisters the CacheClientProxy, most of closeTransientFields is short-circuited. This is the normal code path. In this code path, terminateDispatching calls closeSocket before calling closeTransientFields. In the finally block, closeTransientFields is called. {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.closeTransientFields(CacheClientProxy.java:965) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:945) at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.close(CacheClientProxy.java:794) at org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.closeDeadProxies(CacheClientNotifier.java:1712) at org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.unregisterClient(CacheClientNotifier.java:724) at org.apache.geode.internal.cache.tier.sockets.ClientHealthMonitor.unregisterClient(ClientHealthMonitor.java:270) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:958) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:878) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:613) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) {noformat} Since closeSocket has already been called interminateDispatching, closeTransientFields short-circuits the rest of the method: {noformat} if (!closeSocket()) { // The thread who closed the socket will be responsible to // releaseResourcesForAddress and clearClientInterestList return; } {noformat} This means that these methods aren't called: {noformat} releaseCommBuffer releaseResourcesForAddress {noformat} I addressed this issue in my changes as well. > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at > org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) > at > org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) > at > org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) > at > org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) > at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) > at > org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) > at > org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) > at >
[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743383#comment-16743383 ] Barry Oglesby commented on GEODE-6267: -- Here is a concise list of instances: Startup: Total 186115 9987688 After 1 clients: Total 1403107 61797808 After subject logout: Total 391292 19641912 > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at > org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) > at > org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) > at > org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) > at > org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) > at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) > at > org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) > at > org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) > at > org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256) > at > org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139) > at > org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044) > at > org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797) > at > org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641) > at java.lang.Thread.run(Thread.java:745) > {noformat} > When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), > but the Subjects are not logged out. > With subscription-enabled=true, an additional Subject is created and stored > in the CacheClientProxy subject. This Subject is not logged out either. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:43.023 PST server1 Thread 0> tid=0x52] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at >
[jira] [Comment Edited] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743383#comment-16743383 ] Barry Oglesby edited comment on GEODE-6267 at 1/15/19 8:41 PM: --- Here is a concise list of instances and memory used: Startup: Total 186115 9987688 After 1 clients: Total 1403107 61797808 After subject logout: Total 391292 19641912 was (Author: barry.oglesby): Here is a concise list of instances: Startup: Total 186115 9987688 After 1 clients: Total 1403107 61797808 After subject logout: Total 391292 19641912 > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at > org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) > at > org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) > at > org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) > at > org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) > at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) > at > org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) > at > org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) > at > org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256) > at > org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139) > at > org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044) > at > org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797) > at > org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641) > at java.lang.Thread.run(Thread.java:745) > {noformat} > When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), > but the Subjects are not logged out. > With subscription-enabled=true, an additional Subject is created and stored > in the CacheClientProxy subject. This Subject is not logged out either. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:43.023 PST server1 Thread 0> tid=0x52] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at >
[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743380#comment-16743380 ] Barry Oglesby commented on GEODE-6267: -- After adding code to logout the Subjects in ClientUserAuths and CacheClientProxy: {noformat} num #instances #bytes class name -- 1: 79717 7093992 [C 2: 79610 1910640 java.lang.String 3: 11596 1599472 [B 4: 31794 1017408 java.util.concurrent.ConcurrentHashMap$Node 5: 7477 841096 java.lang.Class 6: 9998 719856 org.apache.geode.distributed.internal.membership.gms.GMSMember 7: 9998 479904 org.apache.geode.distributed.internal.membership.InternalDistributedMember 8: 5956 445384 [Ljava.lang.Object; 9: 753 408176 [J 10: 20810 332960 java.lang.Object 11: 10022 320704 java.net.InetAddress$InetAddressHolder 12: 9989 319648 org.apache.geode.distributed.DurableClientAttributes 13: 9988 319616 org.apache.geode.internal.cache.tier.sockets.ClientProxyMembershipID 14: 2980 262240 java.lang.reflect.Method 15: 7956 254592 java.util.HashMap$Node 16: 10221 245304 java.lang.Long 17: 10017 240408 java.net.Inet4Address 18: 122 223824 [Ljava.util.concurrent.ConcurrentHashMap$Node; 19: 4881 208488 [I 20: 11264 202016 [Ljava.lang.String; 21: 2139 198488 [Ljava.util.HashMap$Node; 22: 2119 169520 java.lang.reflect.Constructor 23: 4120 164800 java.util.LinkedHashMap$Entry 24: 1726 96656 java.util.LinkedHashMap 25: 1760 84480 java.util.HashMap Total 391292 19641912 {noformat} There is still a leak here in I think ClientProxyMembershipIDs. I'll take a look at that. > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at > org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) > at > org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) > at > org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) > at > org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) > at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) > at > org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) > at > org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) > at > org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256) > at > org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139) > at > org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044) > at > org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797) > at > org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85) > at >
[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743377#comment-16743377 ] Barry Oglesby commented on GEODE-6267: -- Here are some histograms. Startup: {noformat} num #instances #bytes class name -- 1: 66767 8069280 [C 2: 4565 5288072 [B 3: 36908 2952640 java.util.zip.ZipEntry 4: 66702 1600848 java.lang.String 5: 7002 789552 java.lang.Class 6: 5084 567984 [Ljava.lang.Object; 7: 5153 453464 java.lang.reflect.Method 8: 656 388464 [J 9: 11897 380704 java.util.concurrent.ConcurrentHashMap$Node 10: 3954 336880 [Ljava.util.HashMap$Node; 11: 8124 324960 java.util.LinkedHashMap$Entry 12: 10121 323872 java.util.HashMap$Node 13: 4751 198016 [I 14: 10665 170640 java.lang.Object 15: 2131 170480 java.lang.reflect.Constructor 16: 2749 153944 java.util.LinkedHashMap 17: 1619 116568 java.lang.reflect.Field 18: 5503 115384 [Ljava.lang.Class; 19: 2202 105696 java.util.HashMap 20: 118 94864 [Ljava.util.concurrent.ConcurrentHashMap$Node; 21: 2096 80416 [Ljava.lang.String; 22: 1871 74840 java.lang.ref.Finalizer 23: 1175 65800 java.lang.Class$ReflectionData 24: 1409 56360 java.lang.ref.SoftReference 25: 607 53160 [Ljava.lang.reflect.Method; Total 186115 9987688 {noformat} After connecting and disconnecting a client 1 times: {noformat} num #instances #bytes class name -- 1: 229739 15010256 [C 2: 92139 7398488 [Ljava.util.HashMap$Node; 3: 229634 5511216 java.lang.String 4: 61726 3456656 java.util.LinkedHashMap 5: 97956 3134592 java.util.HashMap$Node 6: 64120 2564800 java.util.LinkedHashMap$Entry 7: 35955 2125344 [Ljava.lang.Object; 8: 61804 1977728 java.util.concurrent.ConcurrentHashMap$Node 9: 30485 1954128 [Ljava.util.Hashtable$Entry; 10: 60493 1935776 java.util.Hashtable$Entry 11: 11601 1597640 [B 12: 31760 1524480 java.util.HashMap 13: 30072 1443456 java.util.Properties 14: 3 144 org.apache.shiro.session.mgt.SimpleSession 15: 7477 841096 java.lang.Class 16: 30464 731136 java.util.ArrayList 17: 30009 720216 java.util.Date 18: 10003 720216 org.apache.geode.distributed.internal.membership.gms.GMSMember 19: 3 72 TestPrincipal 20: 3 72 org.apache.shiro.subject.SimplePrincipalCollection 21: 122 485904 [Ljava.util.concurrent.ConcurrentHashMap$Node; 22: 10003 480144 org.apache.geode.distributed.internal.membership.InternalDistributedMember 23: 30007 480112 java.util.LinkedHashSet 24: 732 405536 [J 25: 20815 333040 java.lang.Object Total 1403107 61797808 {noformat} After connecting and disconnecting a client enough times to create ~182k SimpleSessions: {noformat} num #instances #bytes class name -- 1: 1252634 78052152 [C 2: 552473 44224760 [Ljava.util.HashMap$Node; 3: 1252511 30060264 java.lang.String 4: 368616 20642496 java.util.LinkedHashMap 5: 558197 17862304 java.util.HashMap$Node 6: 371010 14840400 java.util.LinkedHashMap$Entry 7: 183930 11774608 [Ljava.util.Hashtable$Entry; 8: 367383 11756256 java.util.Hashtable$Entry 9: 189397 10719072 [Ljava.lang.Object; 10: 317462 10158784 java.util.concurrent.ConcurrentHashMap$Node 11: 185203 8889744 java.util.HashMap 12: 183517 8808816 java.util.Properties 13: 183445 8805360 org.apache.shiro.session.mgt.SimpleSession 14: 62707 7321352 [B 15: 183909 4413816 java.util.ArrayList 16: 183454 4402896 java.util.Date 17: 183445 4402680 TestPrincipal 18: 183445 4402680 org.apache.shiro.subject.SimplePrincipalCollection 19: 61108 4399776 org.apache.geode.distributed.internal.membership.gms.GMSMember 20: 183452 2935232 java.util.LinkedHashSet Total 7692368 330322672 {noformat} > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at >
[jira] [Assigned] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
[ https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6267: Assignee: Barry Oglesby > Subjects are not logged out when a client departs causing a memory leak > --- > > Key: GEODE-6267 > URL: https://issues.apache.org/jira/browse/GEODE-6267 > Project: Geode > Issue Type: Bug > Components: security >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client with security enabled connects to a server, the > IntegratedSecurityService logs in a Subject. This causes a SimpleSession to > be created. > The Subject is stored in ClientUserAuths.uniqueIdVsSubject. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:42.993 PST server1 Thread 0> tid=0x4e] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at > org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) > at > org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) > at > org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) > at > org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) > at > org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) > at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) > at > org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) > at > org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) > at > org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) > at > org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256) > at > org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139) > at > org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044) > at > org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797) > at > org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641) > at java.lang.Thread.run(Thread.java:745) > {noformat} > When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), > but the Subjects are not logged out. > With subscription-enabled=true, an additional Subject is created and stored > in the CacheClientProxy subject. This Subject is not logged out either. > Here is a stack showing the SimpleSession creation: > {noformat} > [warning 2019/01/08 18:02:43.023 PST server1 Thread 0> tid=0x52] SimpleSession. invoked: > java.lang.Exception > at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) > at > org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) > at > org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) > at > org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) > at > org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) > at >
[jira] [Created] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak
Barry Oglesby created GEODE-6267: Summary: Subjects are not logged out when a client departs causing a memory leak Key: GEODE-6267 URL: https://issues.apache.org/jira/browse/GEODE-6267 Project: Geode Issue Type: Bug Components: security Reporter: Barry Oglesby When a client with security enabled connects to a server, the IntegratedSecurityService logs in a Subject. This causes a SimpleSession to be created. The Subject is stored in ClientUserAuths.uniqueIdVsSubject. Here is a stack showing the SimpleSession creation: {noformat} [warning 2019/01/08 18:02:42.993 PST server1 tid=0x4e] SimpleSession. invoked: java.lang.Exception at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) at org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) at org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) at org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) at org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) at org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) at org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) at org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) at org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) at org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) at org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166) at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147) at org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383) at org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350) at org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183) at org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283) at org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256) at org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139) at org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044) at org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797) at org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641) at java.lang.Thread.run(Thread.java:745) {noformat} When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), but the Subjects are not logged out. With subscription-enabled=true, an additional Subject is created and stored in the CacheClientProxy subject. This Subject is not logged out either. Here is a stack showing the SimpleSession creation: {noformat} [warning 2019/01/08 18:02:43.023 PST server1 tid=0x52] SimpleSession. invoked: java.lang.Exception at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99) at org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44) at org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163) at org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154) at org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136) at org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99) at org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152) at org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336) at org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312) at org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204) at
[jira] [Assigned] (GEODE-6246) An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy during GatewaySender queue initialization
[ https://issues.apache.org/jira/browse/GEODE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6246: Assignee: Barry Oglesby > An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy > during GatewaySender queue initialization > --- > > Key: GEODE-6246 > URL: https://issues.apache.org/jira/browse/GEODE-6246 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > An EntryNotFoundException like below can be thrown by > BucketRegionQueue.basicDestroy during GatewaySender queue initialization: > {noformat} > [warn 2019/01/03 15:53:00.423 PST > tid=0x56] Task failed with exception > org.apache.geode.cache.EntryNotFoundException: 57546 > at > org.apache.geode.internal.cache.BucketRegionQueue.basicDestroy(BucketRegionQueue.java:368) > at > org.apache.geode.internal.cache.BucketRegionQueue.destroyKey(BucketRegionQueue.java:564) > at > org.apache.geode.internal.cache.BucketRegionQueue.destroyFailedBatchRemovalMessageKeys(BucketRegionQueue.java:181) > at > org.apache.geode.internal.cache.BucketRegionQueue.initializeEventSeqNumQueue(BucketRegionQueue.java:151) > at > org.apache.geode.internal.cache.BucketRegionQueue.cleanUpDestroyedTokensAndMarkGIIComplete(BucketRegionQueue.java:89) > at > org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1220) > at > org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1071) > at > org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:256) > at > org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:1012) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:776) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:451) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:310) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2881) > at > org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1122) > at > org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:511) > at > org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54) > at > org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100) > at > org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:846) > at > org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
[ https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-6205. -- Resolution: Fixed Fix Version/s: 1.9.0 > The cluster configuration service create disk-store command saves the > relative path name rather than the absolute one in the disk-dir > - > > Key: GEODE-6205 > URL: https://issues.apache.org/jira/browse/GEODE-6205 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 1h > Remaining Estimate: 0h > > This command: > {noformat} > create disk-store --name=data_store --dir=/path/to/geode_data` > {noformat} > Creates this disk-store configuration: > {noformat} > > >geode_data > > > {noformat} > Only the relative file name is saved. > The CreateDiskStoreCommand createDiskStoreType sets the relative path by > calling getName: > {noformat} > for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { >DiskDirType diskDir = new DiskDirType(); > -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); > > diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); >diskDirs.add(diskDir); > } > {noformat} > Instead if it called getAbsolutePath, the disk-store configuration would be: > {noformat} > > >/path/to/geode_data > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
[ https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-6205: Assignee: Barry Oglesby > The cluster configuration service create disk-store command saves the > relative path name rather than the absolute one in the disk-dir > - > > Key: GEODE-6205 > URL: https://issues.apache.org/jira/browse/GEODE-6205 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > This command: > {noformat} > create disk-store --name=data_store --dir=/path/to/geode_data` > {noformat} > Creates this disk-store configuration: > {noformat} > > >geode_data > > > {noformat} > Only the relative file name is saved. > The CreateDiskStoreCommand createDiskStoreType sets the relative path by > calling getName: > {noformat} > for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { >DiskDirType diskDir = new DiskDirType(); > -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); > > diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); >diskDirs.add(diskDir); > } > {noformat} > Instead if it called getAbsolutePath, the disk-store configuration would be: > {noformat} > > >/path/to/geode_data > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
[ https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721954#comment-16721954 ] Barry Oglesby commented on GEODE-6205: -- Using the absolute path doesn't allow multiple servers to start on the same host. Setting the disk-dir should depend on the input file name. If it is relative, the disk-dir should be relative; if it is absolute, the disk-dir should be absolute. > The cluster configuration service create disk-store command saves the > relative path name rather than the absolute one in the disk-dir > - > > Key: GEODE-6205 > URL: https://issues.apache.org/jira/browse/GEODE-6205 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Priority: Major > > This command: > {noformat} > create disk-store --name=data_store --dir=/path/to/geode_data` > {noformat} > Creates this disk-store configuration: > {noformat} > > >geode_data > > > {noformat} > Only the relative file name is saved. > The CreateDiskStoreCommand createDiskStoreType sets the relative path by > calling getName: > {noformat} > for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { >DiskDirType diskDir = new DiskDirType(); > -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); > > diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); >diskDirs.add(diskDir); > } > {noformat} > Instead if it called getAbsolutePath, the disk-store configuration would be: > {noformat} > > >/path/to/geode_data > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
[ https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6205: - Description: This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/geode_data {noformat} was: This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/geode_data {noformat} > The cluster configuration service create disk-store command saves the > relative path name rather than the absolute one in the disk-dir > - > > Key: GEODE-6205 > URL: https://issues.apache.org/jira/browse/GEODE-6205 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Priority: Major > > This command: > {noformat} > create disk-store --name=data_store --dir=/path/to/geode_data` > {noformat} > Creates this disk-store configuration: > {noformat} > > >geode_data > > > {noformat} > Only the relative file name is saved. > The CreateDiskStoreCommand createDiskStoreType sets the relative path by > calling getName: > {noformat} > for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { >DiskDirType diskDir = new DiskDirType(); > -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); > > diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); >diskDirs.add(diskDir); > } > {noformat} > Instead if it called getAbsolutePath, the disk-store configuration would be: > {noformat} > > >/path/to/geode_data > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
[ https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6205: - Description: This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/geode_data {noformat} was: This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/gemfire_data {noformat} > The cluster configuration service create disk-store command saves the > relative path name rather than the absolute one in the disk-dir > - > > Key: GEODE-6205 > URL: https://issues.apache.org/jira/browse/GEODE-6205 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Priority: Major > > This command: > {noformat} > create disk-store --name=data_store --dir=/path/to/geode_data` > {noformat} > Creates this disk-store configuration: > {noformat} > > > geode_data > > > {noformat} > Only the relative file name is saved. > The CreateDiskStoreCommand createDiskStoreType sets the relative path by > calling getName: > {noformat} > for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { > DiskDirType diskDir = new DiskDirType(); > -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); > > diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); > diskDirs.add(diskDir); > } > {noformat} > Instead if it called getAbsolutePath, the disk-store configuration would be: > {noformat} > > > /path/to/geode_data > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
Barry Oglesby created GEODE-6205: Summary: The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir Key: GEODE-6205 URL: https://issues.apache.org/jira/browse/GEODE-6205 Project: Geode Issue Type: Bug Components: gfsh Reporter: Barry Oglesby This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/gemfire_data {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir
[ https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby updated GEODE-6205: - Description: This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/geode_data {noformat} was: This command: {noformat} create disk-store --name=data_store --dir=/path/to/geode_data` {noformat} Creates this disk-store configuration: {noformat} geode_data {noformat} Only the relative file name is saved. The CreateDiskStoreCommand createDiskStoreType sets the relative path by calling getName: {noformat} for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { DiskDirType diskDir = new DiskDirType(); -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); diskDirs.add(diskDir); } {noformat} Instead if it called getAbsolutePath, the disk-store configuration would be: {noformat} /path/to/geode_data {noformat} > The cluster configuration service create disk-store command saves the > relative path name rather than the absolute one in the disk-dir > - > > Key: GEODE-6205 > URL: https://issues.apache.org/jira/browse/GEODE-6205 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Barry Oglesby >Priority: Major > > This command: > {noformat} > create disk-store --name=data_store --dir=/path/to/geode_data` > {noformat} > Creates this disk-store configuration: > {noformat} > > >geode_data > > > {noformat} > Only the relative file name is saved. > The CreateDiskStoreCommand createDiskStoreType sets the relative path by > calling getName: > {noformat} > for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) { > DiskDirType diskDir = new DiskDirType(); > -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName()); > > diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i])); > diskDirs.add(diskDir); > } > {noformat} > Instead if it called getAbsolutePath, the disk-store configuration would be: > {noformat} > > >/path/to/geode_data > > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GEODE-6186) Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch processing with conflation enabled
Barry Oglesby created GEODE-6186: Summary: Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch processing with conflation enabled Key: GEODE-6186 URL: https://issues.apache.org/jira/browse/GEODE-6186 Project: Geode Issue Type: Bug Components: wan Reporter: Barry Oglesby Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch processing with conflation enabled This test: 3000 iterations of putAlls with the same 1500 keys into a partitioned region attached to async-event-queue: Produces these numbers in the current code (4 different runs): {noformat} numBatches=645; numENFEs=8622196; totalPeekTime=178517; averagePeekTime=276; totalProcessBatchTime=38936; averageProcessBatchTime=60 numBatches=660; numENFEs=8467986; totalPeekTime=182985; averagePeekTime=277; totalProcessBatchTime=34335; averageProcessBatchTime=52 numBatches=646; numENFEs=8563364; totalPeekTime=179624; averagePeekTime=278; totalProcessBatchTime=37342; averageProcessBatchTime=57 numBatches=632; numENFEs=8716942; totalPeekTime=175570; averagePeekTime=277; totalProcessBatchTime=39732; averageProcessBatchTime=62 {noformat} After some changes mainly in BucketRegionQueue: {noformat} numBatches=782; numENFEs=3621039; totalPeekTime=195760; averagePeekTime=250; totalProcessBatchTime=18724; averageProcessBatchTime=23 numBatches=791; numENFEs=3604933; totalPeekTime=197980; averagePeekTime=250; totalProcessBatchTime=18587; averageProcessBatchTime=23 numBatches=790; numENFEs=3600038; totalPeekTime=197774; averagePeekTime=250; totalProcessBatchTime=18611; averageProcessBatchTime=23 numBatches=795; numENFEs=3584490; totalPeekTime=199060; averagePeekTime=250; totalProcessBatchTime=18063; averageProcessBatchTime=22 {noformat} numBatches is the number of batches peeked numENFEs is the number of EntryNotFoundExceptions thrown totalPeekTime is the total time to peek all batches averagePeekTime is the average time to peek a batch totalProcessBatchTime is the total time to process all batches averageProcessBatchTime is the average time to process a batch (includes listener callback and remove from queue) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GEODE-5959) Nested function executions can cause a performance issue
[ https://issues.apache.org/jira/browse/GEODE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-5959. -- Resolution: Fixed Fix Version/s: 1.9.0 > Nested function executions can cause a performance issue > > > Key: GEODE-5959 > URL: https://issues.apache.org/jira/browse/GEODE-5959 > Project: Geode > Issue Type: Bug > Components: functions >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When a client executes a function, the server does: > 1. The ServerConnection receives the function request, creates a runnable > task and executes it on the thread pool. > 2a. If there are available threads in the pool, one is used > 2b. If there are no available threads in the pool and all the threads are not > in use, then a thread is created and used > 2c. If there are no available threads in the pool and all the threads are in > use, then: > - the task is put into a queue (a BlockingQueue) > - a thread called Function Execution Processor1 takes the task from that > queue and offers it to another queue. This other queue is a SynchronousQueue > (an insert waits for a removal). So, basically a thread has to be available > for the offer to succeed. > - after 5 seconds by default (controlled by gemfire.RETRY_INTERVAL), the > offer fails and the rejectedExecutionHandler is invoked. This handler spins > off a thread to process that task. > Once the thread pool is in the state where no threads are available, every > new function execution will take at least 5 seconds plus the time it takes to > execute the function. > If MAX_FE_THREADS is 32 and I run a test like: > - launch 50 ParentFunctions onRegion with a replicated region each of which > executes a ChildFunction on the same region > - launch 1000 (or some number) of other functions that execute quickly > All 32 threads in the pool will be in use immediately. These threads will be > processing ParentFunctions which have invoked the ChildFunction and be > waiting for the result. The next 18 (making 50) will cause threads to be spun > off after 5 second wait for each. These will also get stuck waiting for the > ChildFunctions to execute. The next 1000 will each take 5 seconds to offer, > then spin off a thread that executes quickly. These are all processed > sequentially. If the function processes quickly enough, it won't show up in > thread dumps. > When the threads pool is in this state, the server will contain threads like > below. > For each client request, there will be a ServerConnection thread waiting for > the function execution request to complete here: > {noformat} > "ServerConnection on port 62483 Thread 42" #155 daemon prio=5 os_prio=31 > tid=0x7fdf072b6800 nid=0x15703 waiting on condition [0x700018bb7000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0006c01c1378> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.getResult(LocalResultCollectorImpl.java:110) > at > org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:255) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844) > at > org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:593) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$107/881662115.invoke(Unknown > Source) > at >
[jira] [Assigned] (GEODE-5959) Nested function executions can cause a performance issue
[ https://issues.apache.org/jira/browse/GEODE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby reassigned GEODE-5959: Assignee: Barry Oglesby > Nested function executions can cause a performance issue > > > Key: GEODE-5959 > URL: https://issues.apache.org/jira/browse/GEODE-5959 > Project: Geode > Issue Type: Bug > Components: functions >Reporter: Barry Oglesby >Assignee: Barry Oglesby >Priority: Major > > When a client executes a function, the server does: > 1. The ServerConnection receives the function request, creates a runnable > task and executes it on the thread pool. > 2a. If there are available threads in the pool, one is used > 2b. If there are no available threads in the pool and all the threads are not > in use, then a thread is created and used > 2c. If there are no available threads in the pool and all the threads are in > use, then: > - the task is put into a queue (a BlockingQueue) > - a thread called Function Execution Processor1 takes the task from that > queue and offers it to another queue. This other queue is a SynchronousQueue > (an insert waits for a removal). So, basically a thread has to be available > for the offer to succeed. > - after 5 seconds by default (controlled by gemfire.RETRY_INTERVAL), the > offer fails and the rejectedExecutionHandler is invoked. This handler spins > off a thread to process that task. > Once the thread pool is in the state where no threads are available, every > new function execution will take at least 5 seconds plus the time it takes to > execute the function. > If MAX_FE_THREADS is 32 and I run a test like: > - launch 50 ParentFunctions onRegion with a replicated region each of which > executes a ChildFunction on the same region > - launch 1000 (or some number) of other functions that execute quickly > All 32 threads in the pool will be in use immediately. These threads will be > processing ParentFunctions which have invoked the ChildFunction and be > waiting for the result. The next 18 (making 50) will cause threads to be spun > off after 5 second wait for each. These will also get stuck waiting for the > ChildFunctions to execute. The next 1000 will each take 5 seconds to offer, > then spin off a thread that executes quickly. These are all processed > sequentially. If the function processes quickly enough, it won't show up in > thread dumps. > When the threads pool is in this state, the server will contain threads like > below. > For each client request, there will be a ServerConnection thread waiting for > the function execution request to complete here: > {noformat} > "ServerConnection on port 62483 Thread 42" #155 daemon prio=5 os_prio=31 > tid=0x7fdf072b6800 nid=0x15703 waiting on condition [0x700018bb7000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0006c01c1378> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.getResult(LocalResultCollectorImpl.java:110) > at > org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:255) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844) > at > org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:593) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$107/881662115.invoke(Unknown > Source) > at > org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) > at >
[jira] [Created] (GEODE-5959) Nested function executions can cause a performance issue
Barry Oglesby created GEODE-5959: Summary: Nested function executions can cause a performance issue Key: GEODE-5959 URL: https://issues.apache.org/jira/browse/GEODE-5959 Project: Geode Issue Type: Bug Components: functions Reporter: Barry Oglesby When a client executes a function, the server does: 1. The ServerConnection receives the function request, creates a runnable task and executes it on the thread pool. 2a. If there are available threads in the pool, one is used 2b. If there are no available threads in the pool and all the threads are not in use, then a thread is created and used 2c. If there are no available threads in the pool and all the threads are in use, then: - the task is put into a queue (a BlockingQueue) - a thread called Function Execution Processor1 takes the task from that queue and offers it to another queue. This other queue is a SynchronousQueue (an insert waits for a removal). So, basically a thread has to be available for the offer to succeed. - after 5 seconds by default (controlled by gemfire.RETRY_INTERVAL), the offer fails and the rejectedExecutionHandler is invoked. This handler spins off a thread to process that task. Once the thread pool is in the state where no threads are available, every new function execution will take at least 5 seconds plus the time it takes to execute the function. If MAX_FE_THREADS is 32 and I run a test like: - launch 50 ParentFunctions onRegion with a replicated region each of which executes a ChildFunction on the same region - launch 1000 (or some number) of other functions that execute quickly All 32 threads in the pool will be in use immediately. These threads will be processing ParentFunctions which have invoked the ChildFunction and be waiting for the result. The next 18 (making 50) will cause threads to be spun off after 5 second wait for each. These will also get stuck waiting for the ChildFunctions to execute. The next 1000 will each take 5 seconds to offer, then spin off a thread that executes quickly. These are all processed sequentially. If the function processes quickly enough, it won't show up in thread dumps. When the threads pool is in this state, the server will contain threads like below. For each client request, there will be a ServerConnection thread waiting for the function execution request to complete here: {noformat} "ServerConnection on port 62483 Thread 42" #155 daemon prio=5 os_prio=31 tid=0x7fdf072b6800 nid=0x15703 waiting on condition [0x700018bb7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006c01c1378> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.getResult(LocalResultCollectorImpl.java:110) at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:255) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:593) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$107/881662115.invoke(Unknown Source) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$18/49222910.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) {noformat} There will be a corresponding Function Execution Processor thread in the middle of executing the parent function and waiting for child function execution: {noformat} "Function Execution Processor12" #158 daemon prio=5 os_prio=31 tid=0x7fdf072a6000 nid=0xd707 waiting on condition [0x700014af7000]
[jira] [Resolved] (GEODE-5917) Gfsh query results show a mix of PdxInstances and PreferBytesCachedDeserializables with read-serialized=true
[ https://issues.apache.org/jira/browse/GEODE-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Oglesby resolved GEODE-5917. -- Resolution: Fixed Fix Version/s: 1.8.0 > Gfsh query results show a mix of PdxInstances and > PreferBytesCachedDeserializables with read-serialized=true > > > Key: GEODE-5917 > URL: https://issues.apache.org/jira/browse/GEODE-5917 > Project: Geode > Issue Type: Bug > Components: querying >Reporter: Barry Oglesby >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Gfsh query results show a mix of PdxInstances and > PreferBytesCachedDeserializables with read-serialized=true > A gfsh query on a partitioned region with pdx read-serialized=true shows > results like: > {noformat} > shares | price | id | cusip | serializedValue | > sizeInBytes | stringForm >| valueSizeInBytes | DSFID |deserializedForReading > | value| serialized > -- | - | -- | -- | -- | > --- | > -- | > | -- | - | > -- | -- > 70 | 590.923583984375 | 0 | MCD| | > | > || | > | | > 77 | 740.6094970703125 | 3 | MGM| | > | > || | > | | > | | | | org.json.JSONArray | 56 > | PDX[4456129,TradePdx]{cusip=GGB, id=1, price=26.52454376220703, > shares=49} | 44 | -65| > org.apache.geode.pdx.internal.PdxInstanceImpl | org.json.JSONArray | true > | | | | org.json.JSONArray | 56 > | PDX[4456129,TradePdx]{cusip=STO, id=2, price=643.344482421875, shares=85} > | 44 | -65| org.apache.geode.pdx.internal.PdxInstanceImpl > | org.json.JSONArray | true > | | | | org.json.JSONArray | 56 > | PDX[4456129,TradePdx]{cusip=MGM, id=4, price=724.223388671875, shares=0} > | 44 | -65| org.apache.geode.pdx.internal.PdxInstanceImpl > | org.json.JSONArray | true > {noformat} > In this case, there are 2 servers and no redundant copies. > The DataCommandFunction.select query returns: > {noformat} > DataCommandFunction.select results=CumulativeNonDistinctResults::[ > PDX[4456129,TradePdx]{cusip=MCD, id=0, price=590.923583984375, shares=70}, > PDX[4456129,TradePdx]{cusip=MGM, id=3, price=740.6094970703125, shares=77}, > PreferBytesCachedDeserializable@1599752189, > PreferBytesCachedDeserializable@1120782877, > PreferBytesCachedDeserializable@1023583807 > ] > {noformat} > The local query returns the 2 PdxInstances, and the remote query returns the > 3 PreferBytesCachedDeserializables: > {noformat} > [info 2018/10/23 13:53:18.046 PDT tid=0x4f] > Trace Info for Query: SELECT * FROM /data limit 100 > Local 192.168.2.6(76490):1026 took 6.887ms and returned 2 results; > Remote 192.168.2.6(76479):1025 took 45.164ms and returned 3 results; > indexesUsed(0) > {noformat} > The 3 PreferBytesCachedDeserializables are not converted to PdxInstances > before they are returned. > PartitionedRegionQueryEvaluator.addResultsToResultSet adds the results to the > CumulativeNonDistinctResults result set. > The CumulativeCollectionIterator iterates the > CumulativeNonDistinctResultsCollection and converts the objects to PDX here: > {noformat} > java.lang.Exception: Stack trace > at java.lang.Thread.dumpStack(Thread.java:1333) > at > org.apache.geode.cache.query.internal.utils.PDXUtils.convertPDX(PDXUtils.java:83) > at > org.apache.geode.cache.query.internal.CumulativeNonDistinctResults$CumulativeNonDistinctResultsCollection$CumulativeCollectionIterator.next(CumulativeNonDistinctResults.java:259) > at > org.apache.geode.cache.query.internal.utils.LimitIterator.next(LimitIterator.java:49) > at > org.apache.geode.management.internal.cli.functions.DataCommandFunction.select_SelectResults(DataCommandFunction.java:271) > at > org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:226) > at >