[jira] [Created] (GEODE-7216) The ExportStackTraceCommand should include a timestamp similar to jstack

2019-09-18 Thread Barry Oglesby (Jira)
Barry Oglesby created GEODE-7216:


 Summary: The ExportStackTraceCommand should include a timestamp 
similar to jstack
 Key: GEODE-7216
 URL: https://issues.apache.org/jira/browse/GEODE-7216
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barry Oglesby


Currently the ExportStackTraceCommand dumps stack traces with a head for each 
member like:
{noformat}
*** Stack-trace for member server3 ***
{noformat}
It would be nice for support purposes if it included a timestamp like:
{noformat}
*** Stack-trace for member server3 at 2019-09-16 10:39:57 ***
{noformat}
That'll help correlate stack traces with logs and stats.

Something like:
{noformat}
ps.append(STACK_TRACE_FOR_MEMBER).append(entry.getKey()).append(" at ")
.append(new SimpleDateFormat("-MM-dd HH:mm:ss").format(new 
Date())).append(" ***")
.append(System.lineSeparator());
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-6586) ClientServerTransactionFailoverDistributedTest txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted failed

2019-09-10 Thread Barry Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927135#comment-16927135
 ] 

Barry Oglesby commented on GEODE-6586:
--

This reoccurred:

https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK8/builds/959

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-results/distributedTest/1568159476/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-artifacts/1568159476/distributedtestfiles-OpenJDK8-9.10.0-build.0108.tgz


> ClientServerTransactionFailoverDistributedTest 
> txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted
>  failed
> -
>
> Key: GEODE-6586
> URL: https://issues.apache.org/jira/browse/GEODE-6586
> Project: Geode
>  Issue Type: Bug
>  Components: transactions
>Reporter: xiaojian zhou
>Assignee: Eric Shu
>Priority: Major
>
> {noformat}
> It's found in 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/559
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest
>  > 
> txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted
>  FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest$$Lambda$177/577249945.run
>  in VM 1 running on Host 09628b632eb3 with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted(ClientServerTransactionFailoverDistributedTest.java:437)
> Caused by:
> org.junit.ComparisonFailure: expected:<"TxValue-1"> but was:
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.lambda$txCommitGetsAppliedOnAllTheReplicasAfterHostIsShutDownAndIfOneOfTheNodeHasCommitted$bb17a952$7(ClientServerTransactionFailoverDistributedTest.java:439)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GEODE-7189) CI Failure: ServerLauncherTest > startWaitsForStartupTasksToComplete failed

2019-09-10 Thread Barry Oglesby (Jira)
Barry Oglesby created GEODE-7189:


 Summary: CI Failure: ServerLauncherTest > 
startWaitsForStartupTasksToComplete failed
 Key: GEODE-7189
 URL: https://issues.apache.org/jira/browse/GEODE-7189
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barry Oglesby


{noformat}
org.apache.geode.distributed.ServerLauncherTest > 
startWaitsForStartupTasksToComplete FAILED
org.awaitility.core.ConditionTimeoutException: Assertion condition defined 
as a lambda expression in org.apache.geode.distributed.ServerLauncherTest that 
uses java.util.concurrent.CompletableFuture 
Wanted but not invoked:
completableFuture.thenRun();
-> at 
org.apache.geode.distributed.ServerLauncherTest.lambda$startWaitsForStartupTasksToComplete$14(ServerLauncherTest.java:428)
Actually, there were zero interactions with this mock.
 within 300 seconds.
at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145)
at 
org.awaitility.core.AssertionCondition.await(AssertionCondition.java:122)
at 
org.awaitility.core.AssertionCondition.await(AssertionCondition.java:32)
at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902)
at 
org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:723)
at 
org.apache.geode.distributed.ServerLauncherTest.startWaitsForStartupTasksToComplete(ServerLauncherTest.java:428)

Caused by:
Wanted but not invoked:
completableFuture.thenRun();
-> at 
org.apache.geode.distributed.ServerLauncherTest.lambda$startWaitsForStartupTasksToComplete$14(ServerLauncherTest.java:428)
Actually, there were zero interactions with this mock.
at 
org.apache.geode.distributed.ServerLauncherTest.lambda$startWaitsForStartupTasksToComplete$14(ServerLauncherTest.java:428)
{noformat}
UnitTestOpenJDK11 #943:

https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/UnitTestOpenJDK11/builds/943

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-results/test/1568154432/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0108/test-artifacts/1568154432/unittestfiles-OpenJDK11-9.10.0-build.0108.tgz




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GEODE-7187) CI Failure: RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.test[from_v120, with reindex=false] hung

2019-09-10 Thread Barry Oglesby (Jira)
Barry Oglesby created GEODE-7187:


 Summary: CI Failure: 
RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.test[from_v120,
 with reindex=false] hung
 Key: GEODE-7187
 URL: https://issues.apache.org/jira/browse/GEODE-7187
 Project: Geode
  Issue Type: Bug
  Components: lucene
Reporter: Barry Oglesby


UpgradeTestOpenJDK8 #1052:

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/1052

All three stack traces contain this thread closing the cache and waiting for 
replies:
{noformat}
"RMI TCP Connection(2)-172.17.0.43" #33 daemon prio=5 os_prio=0 
tid=0x7f75f4001800 nid=0x867 waiting on condition [0x7f762a17f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe120d5b0> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:718)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2314)
- locked <0xe001b568> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1937)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1927)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit.closeCache(LuceneSearchWithRollingUpgradeDUnit.java:859)
at 
org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit.access$1100(LuceneSearchWithRollingUpgradeDUnit.java:67)
at 
org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit$12.run2(LuceneSearchWithRollingUpgradeDUnit.java:672)
at 
org.apache.geode.cache30.CacheSerializableRunnable.run(CacheSerializableRunnable.java:53)
{noformat}
I don't see anything processing that CloseCacheMessage.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (GEODE-7016) CI failure: ServerStartupRedundancyRecoveryNotificationTest > startupReportsOnlineOnlyAfterRedundancyRestored FAILED

2019-09-10 Thread Barry Oglesby (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926847#comment-16926847
 ] 

Barry Oglesby commented on GEODE-7016:
--

This reoccurred:

https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/AcceptanceTestOpenJDK8/builds/940

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0104/test-results/acceptanceTest/1568065877/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0104/test-artifacts/1568065877/acceptancetestfiles-OpenJDK8-9.10.0-build.0104.tgz


> CI failure: ServerStartupRedundancyRecoveryNotificationTest > 
> startupReportsOnlineOnlyAfterRedundancyRestored FAILED
> 
>
> Key: GEODE-7016
> URL: https://issues.apache.org/jira/browse/GEODE-7016
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Affects Versions: 1.10.0
>Reporter: Anilkumar Gingade
>Priority: Major
>
> {noformat}
> org.apache.geode.launchers.ServerStartupRedundancyRecoveryNotificationTest > 
> startupReportsOnlineOnlyAfterRedundancyRestored FAILED
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:125)
> at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:125)
> at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:112)
> at 
> org.apache.geode.launchers.ServerStartupRedundancyRecoveryNotificationTest.startupReportsOnlineOnlyAfterRedundancyRestored(ServerStartupRedundancyRecoveryNotificationTest.java:142)
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:125)
> at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:125)
> at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:112)
> at 
> org.apache.geode.launchers.ServerStartupRedundancyRecoveryNotificationTest.stopAllMembers(ServerStartupRedundancyRecoveryNotificationTest.java:128)
> {noformat}
> https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/AcceptanceTestOpenJDK8/builds/797
> Test report artifacts from this job are available at:
> gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.9.0-build.0258/test-artifacts/1564078711/acceptancetestfiles-OpenJDK8-9.9.0-build.0258.tgz



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GEODE-7183) CI Failure: ClientServerFunctionExecutionDUnitTest > testServerExecution_SocketTimeOut_WithoutRegister[ExecuteFunctionById] failed with AssertionError

2019-09-10 Thread Barry Oglesby (Jira)
Barry Oglesby created GEODE-7183:


 Summary: CI Failure: ClientServerFunctionExecutionDUnitTest > 
testServerExecution_SocketTimeOut_WithoutRegister[ExecuteFunctionById] failed 
with AssertionError
 Key: GEODE-7183
 URL: https://issues.apache.org/jira/browse/GEODE-7183
 Project: Geode
  Issue Type: Bug
  Components: functions
Reporter: Barry Oglesby


{noformat}
org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest 
> testServerExecution_SocketTimeOut_WithoutRegister[ExecuteFunctionById] FAILED
org.apache.geode.test.dunit.RMIException: While invoking 
org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest$$Lambda$68/1900027546.run
 in VM 3 running on Host 6c6dc0c2627c with 4 VMs
at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
at 
org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest.testServerExecution_SocketTimeOut_WithoutRegister(ClientServerFunctionExecutionDUnitTest.java:339)

Caused by:
java.lang.AssertionError: Test failed after the execute operation
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest.allServerExecution(ClientServerFunctionExecutionDUnitTest.java:891)
at 
org.apache.geode.internal.cache.execute.ClientServerFunctionExecutionDUnitTest.lambda$testServerExecution_SocketTimeOut_WithoutRegister$bb17a952$2(ClientServerFunctionExecutionDUnitTest.java:339)
{noformat}
The test logs this exception right before the failure:
{noformat}
[vm3] [info 2019/09/09 18:00:10.793 GMT RMI TCP 
Connection(26)-172.17.0.19 tid=0xb1] Exception : 
[vm3] org.apache.geode.cache.client.ServerConnectivityException: Pool 
unexpected SocketException connection=Pooled Connection to 6c6dc0c2627c:25980: 
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
[vm3]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
[vm3]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
[vm3]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:331)
[vm3]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:300)
[vm3]   at 
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:814)
[vm3]   at 
org.apache.geode.cache.client.internal.SingleHopOperationCallable.call(SingleHopOperationCallable.java:52)
[vm3]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[vm3]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[vm3]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[vm3]   at java.lang.Thread.run(Thread.java:748)
[vm3] Caused by: java.net.SocketException: Socket is closed
[vm3]   at java.net.Socket.setSoTimeout(Socket.java:1137)
[vm3]   at 
org.apache.geode.cache.client.internal.AbstractOpWithTimeout.attempt(AbstractOpWithTimeout.java:48)
[vm3]   at 
org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:263)
[vm3]   at 
org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:353)
[vm3]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:750)
[vm3]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:329)
[vm3]   ... 7 more
{noformat}

https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK8/builds/952

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0101/test-results/distributedTest/1568054303/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

gs://gemfire-test-artifacts/builds/gemfire-develop-main/9.10.0-build.0101/test-artifacts/1568054303/distributedtestfiles-OpenJDK8-9.10.0-build.0101.tgz




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (GEODE-7181) CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > EventProcessingMixedSiteOneCurrentSiteTwo[from_v140] failed with BindException

2019-09-10 Thread Barry Oglesby (Jira)
Barry Oglesby created GEODE-7181:


 Summary: CI Failure: 
WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > 
EventProcessingMixedSiteOneCurrentSiteTwo[from_v140] failed with BindException
 Key: GEODE-7181
 URL: https://issues.apache.org/jira/browse/GEODE-7181
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barry Oglesby


{noformat}
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo
 > EventProcessingMixedSiteOneCurrentSiteTwo[from_v140] FAILED
org.apache.geode.test.dunit.RMIException: While invoking 
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo$$Lambda$49/702999041.run
 in VM 4 running on Host 25462cccf035 with 7 VMs
at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
at 
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.EventProcessingMixedSiteOneCurrentSiteTwo(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:76)

Caused by:
java.net.BindException: Failed to create server socket on 
25462cccf035/172.17.0.27[24363]
at 
org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:722)
at 
org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:680)
at 
org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:647)
at 
org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:226)
at 
org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:216)
at 
org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:211)
at 
org.apache.geode.distributed.internal.InternalLocator.startTcpServer(InternalLocator.java:560)
at 
org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:617)
at 
org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:373)
at 
org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:328)
at 
org.apache.geode.distributed.Locator.startLocator(Locator.java:252)
at 
org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:139)
at 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:105)
at 
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:97)
at 
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.lambda$EventProcessingMixedSiteOneCurrentSiteTwo$67afc7f8$1(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:78)

Caused by:
java.net.BindException: Address already in use (Bind failed)
at java.net.PlainSocketImpl.socketBind(Native Method)
at 
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at 
org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:719)
... 14 more
{noformat}
UpgradeTestOpenJDK8 #1054:

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/1054

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0108/test-results/upgradeTest/1568110268/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0108/test-artifacts/1568110268/upgradetestfiles-OpenJDK8-1.11.0-SNAPSHOT.0108.tgz

This same exception been reported two other times in GEODE-6454, but the 
original exception in that JIRA was not this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled

2019-08-26 Thread Barry Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-7066:
-
Fix Version/s: (was: 1.10.0)
   1.11.0

> Events can be lost in a gateway batch containing duplicate non-conflatable 
> events with conflation enabled
> -
>
> Key: GEODE-7066
> URL: https://issues.apache.org/jira/browse/GEODE-7066
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.9.0
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a batch contains duplicate CREATE and DESTROY events on key 1736 like 
> below and conflation is enabled, the earlier events will be overwritten by 
> the later events.
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> The batch will look like this after conflation:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> All the events from threadID=0x30004|5 are gone.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled

2019-08-26 Thread Barry Oglesby (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-7066.
--
Fix Version/s: 1.10.0
   Resolution: Fixed

> Events can be lost in a gateway batch containing duplicate non-conflatable 
> events with conflation enabled
> -
>
> Key: GEODE-7066
> URL: https://issues.apache.org/jira/browse/GEODE-7066
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.9.0
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
> Fix For: 1.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a batch contains duplicate CREATE and DESTROY events on key 1736 like 
> below and conflation is enabled, the earlier events will be overwritten by 
> the later events.
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> The batch will look like this after conflation:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> All the events from threadID=0x30004|5 are gone.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled

2019-08-08 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-7066:
-
Affects Version/s: 1.9.0

> Events can be lost in a gateway batch containing duplicate non-conflatable 
> events with conflation enabled
> -
>
> Key: GEODE-7066
> URL: https://issues.apache.org/jira/browse/GEODE-7066
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.9.0
>Reporter: Barry Oglesby
>Priority: Major
>
> If a batch contains duplicate CREATE and DESTROY events on key 1736 like 
> below and conflation is enabled, the earlier events will be overwritten by 
> the later events.
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> The batch will look like this after conflation:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> All the events from threadID=0x30004|5 are gone.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled

2019-08-08 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-7066:


Assignee: Barry Oglesby

> Events can be lost in a gateway batch containing duplicate non-conflatable 
> events with conflation enabled
> -
>
> Key: GEODE-7066
> URL: https://issues.apache.org/jira/browse/GEODE-7066
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.9.0
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> If a batch contains duplicate CREATE and DESTROY events on key 1736 like 
> below and conflation is enabled, the earlier events will be overwritten by 
> the later events.
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> The batch will look like this after conflation:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> All the events from threadID=0x30004|5 are gone.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled

2019-08-08 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-7066:


 Summary: Events can be lost in a gateway batch containing 
duplicate non-conflatable events with conflation enabled
 Key: GEODE-7066
 URL: https://issues.apache.org/jira/browse/GEODE-7066
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barry Oglesby


If a batch contains duplicate CREATE and DESTROY events on key 1736 like below 
and conflation is enabled, the earlier events will be overwritten by the later 
events.
{noformat}
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736],
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736],
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
{noformat}
The batch will look like this after conflation:
{noformat}
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
 
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
 
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
 
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
 
GatewaySenderEventImpl[id=EventID[id=31 
bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
{noformat}
All the events from threadID=0x30004|5 are gone.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-6933) Gateway sender alert-threshold not working

2019-07-11 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883376#comment-16883376
 ] 

Barry Oglesby commented on GEODE-6933:
--

The AbstractGatewaySenderEventProcessor handleSuccessfulBatchDispatch is the 
only place the warning is logged. That method is only called in the case of a 
GatewaySenderEventCallbackDispatcher (which is the AsyncEventQueue dispatcher).

Its not called in AbstractGatewaySenderEventProcessor handleSuccessBatchAck 
which is the GatewaySenderEventRemoteDispatcher case. The 
GatewaySenderEventRemoteDispatcher is used by the GatewaySender or wan case.

Further, the alert-threshold is only supported:

- in the gateway-sender in the xsd
- in the create gateway-sender gfsh

So, its not even configurable on an AsyncEventQueue. If I hack 
AbstractGatewaySender and set alertThreshold > 0, I can see a warning in the 
AsyncEventQueue case:
{noformat}
[warn 2019/07/11 15:20:38.531 PDT  tid=0x36] CREATE event for region=/data 
key=TradeKey[id=0] value=Trade[id=0; ...] was in the queue for 10124 
milliseconds
{noformat}
AbstractGatewaySenderEventProcessor handleSuccessBatchAck should be changed to 
log the warning just like AbstractGatewaySenderEventProcessor 
handleSuccessBatchAck does.

And either AsyncEventQueue should be modified to support alert-threshold or the 
warning code should be removed from AbstractGatewaySenderEventProcessor 
handleSuccessfulBatchDispatch.


> Gateway sender alert-threshold not working
> --
>
> Key: GEODE-6933
> URL: https://issues.apache.org/jira/browse/GEODE-6933
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>
> When alert-threshold function is activated in gateway senders (set 
> alert-threshold to non zero value),
> event is never raised, for entries which are in queue longer than  
> alert-threshold value.
>  
> Printout from logs:
> Monitor = GatewaySenderMXBeanMonitor descriptor = 
> eventsExceedingAlertThreshold And value = 0
>  
> It seams that reporting of events which exceed alert threshold (class 
> AbstractGatewaySenderEventProcessor), is working only if dispatcher is 
> instance of GatewaySenderEventCallbackDispatcher.
> With deeper analysis, I got to conclusion that for GatewaySender - dispatcher 
> is instance of {color:#629755}GatewaySenderEventRemoteDispatcher.{color}
> {color:#ff}So this function is only working for 
> {color}{color:#808080}AsyncEventQueue, for which dispatcher is instance of 
> GatewaySenderEventCallbackDispatcher.{color}
>  
> {color:#808080}The other problem is that  _getEventsExceedingAlertThreshold() 
> method of_ _GatewaySenderMBean is always returning hardcoded value 
> (0)._{color}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-6953) CI failure: RedundancyLevelPart1DUnitTest testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with ComparisonFailure

2019-07-09 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881644#comment-16881644
 ] 

Barry Oglesby commented on GEODE-6953:
--

Both the primary and secondary connections are established.

Secondary:
{noformat}
[vm1] [info 2019/07/09 23:28:28.287 GMT  
tid=0x106] :Cache server: Initializing secondary server-to-client communication 
socket: Socket[addr=/172.17.0.2,port=59586,localport=29482]

[info 2019/07/09 23:28:28.303 GMT :41002 port 29482> tid=0x84] Cache Client Updater Thread  
on e01ec2d18901(155):41002 port 29482 (e01ec2d18901:29482) : ready to 
process messages.
{noformat}

Primary:
{noformat}
[vm0] [info 2019/07/09 23:28:28.305 GMT  
tid=0x108] :Cache server: Initializing primary server-to-client communication 
socket: Socket[addr=/172.17.0.2,port=47704,localport=22651]

[info 2019/07/09 23:28:28.327 GMT :41001 port 22651> tid=0x85] Cache Client Updater Thread  
on e01ec2d18901(151):41001 port 22651 (e01ec2d18901:22651) : ready to 
process messages.
{noformat}
Immediately after that, there is an AsynchronousCloseException in vm0 (the 
primary):
{noformat}
[vm0] [info 2019/07/09 23:28:28.338 GMT  tid=0x105] Connection: shared=true ordered=true handshake failed to connect 
to peer 172.17.0.2(155):41002 because: 
java.nio.channels.AsynchronousCloseException
{noformat}
Then, a timeout occurs. I'm not sure if this is the registerInterest call:
{noformat}
[warn 2019/07/09 23:28:28.585 GMT  tid=0x1b] Pool unexpected 
socket timed out on client connection=Pooled Connection to e01ec2d18901:22651: 
Connection[e01ec2d18901:22651]@789257760)

[warn 2019/07/09 23:28:28.621 GMT  tid=0x1b] Usage of 
registerInterest(List) has been deprecated. Please use 
registerInterestForKeys(Iterable)
{noformat}
Then, the secondary connection crashes.
{noformat}
[info 2019/07/09 23:28:28.873 GMT  tid=0x1b] Redundant 
subscription endpoint e01ec2d18901:29482 crashed. Scheduling recovery.

[info 2019/07/09 23:28:28.876 GMT 
 tid=0x86] SubscriptionManager 
redundancy satisfier - redundant endpoint has been lost. Attempting to recover.

[warn 2019/07/09 23:28:28.876 GMT  tid=0x1b] Pool unexpected 
socket timed out on client 
connection=SubscriptionConnectionImpl[e01ec2d18901:29482:closed])

[info 2019/07/09 23:28:28.876 GMT :41002 port 29482> tid=0x84] Cache client updater for 
Queue on endpoint e01ec2d18901:29482 exiting. Scheduling recovery.
{noformat}
And the cache is closed:
{noformat}
[info 2019/07/09 23:28:29.000 GMT  tid=0x1b] GemFireCache[id = 
1192776621; isClosing = true; isShutDownAll = false; created = Tue Jul 09 
23:28:28 GMT 2019; server = false; copyOnRead = false; lockLease = 120; 
lockTimeout = 60]: Now closing.
{noformat}

> CI failure: RedundancyLevelPart1DUnitTest 
> testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with 
> ComparisonFailure
> --
>
> Key: GEODE-6953
> URL: https://issues.apache.org/jira/browse/GEODE-6953
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Reporter: Barry Oglesby
>Priority: Major
>
> RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut
>  failed in DistributedTestOpenJDK11 build 762:
> https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK11/builds/762
> {noformat}
> org.junit.ComparisonFailure: expected:<[1]> but was:<[0]>
>   at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at 
> org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut(RedundancyLevelPart1DUnitTest.java:304)
>   at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> 

[jira] [Created] (GEODE-6953) CI failure: RedundancyLevelPart1DUnitTest testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with ComparisonFailure

2019-07-09 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6953:


 Summary: CI failure: RedundancyLevelPart1DUnitTest 
testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut failed with 
ComparisonFailure
 Key: GEODE-6953
 URL: https://issues.apache.org/jira/browse/GEODE-6953
 Project: Geode
  Issue Type: Bug
  Components: client/server
Reporter: Barry Oglesby


RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut
 failed in DistributedTestOpenJDK11 build 762:

https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK11/builds/762
{noformat}
org.junit.ComparisonFailure: expected:<[1]> but was:<[0]>
at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart1DUnitTest.testRedundancySpecifiedNonPrimaryEPFailsDetectionByPut(RedundancyLevelPart1DUnitTest.java:304)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

[jira] [Resolved] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFire

2019-07-02 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6929.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> In the case of a ConcurrentCacheModificationException that occurs while 
> processing a RemotePutMessage, the reply is attempted to be sent twice 
> causing an InternalGemFireError
> --
>
> Key: GEODE-6929
> URL: https://issues.apache.org/jira/browse/GEODE-6929
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The RemotePutMessage operateOnRegion method attempts to sendReply twice if 
> result is false:
> {noformat}
> try {
>   result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
> this.expectedOldValue,
>   this.requireOldValue, this.lastModified, true);
>   if (!this.result) { // make sure the region hasn't gone away
> r.checkReadiness();
> if (!this.ifNew && !this.ifOld) {
>   // no reason to be throwing an exception, so let's retry
>   RemoteOperationException ex = new RemoteOperationException(
>   "unable to perform put, but operation should not fail");
> 1 ->  sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, 
> startTime);
> }
>   }
> ...
> if (sendReply) {
> 2->  sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event);
> }
> {noformat}
> This causes this fatal InternalGemFireError:
> {noformat}
> [fatal 2019/06/28 15:33:01.005 PDT  192.168.1.2(gateway-ny-proxy-1:77395):41003 unshared ordered uid=12 dom 
> #1 port=58836> tid=0x4c] Uncaught exception processing 
> tx.RemotePutMessage(regionPath=/TradeDateCalendar; 
> sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; 
> processorId=0; key=87; value=(5 bytes); 
> callback=GatewaySenderEventCallbackArgument 
> [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];
>  op=UPDATE; 
> bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1;
>  eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; 
> ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; 
> hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false)
> org.apache.geode.InternalGemFireError: Trying to reply twice to a message
>   at org.apache.geode.internal.Assert.throwError(Assert.java:89)
>   at org.apache.geode.internal.Assert.assertTrue(Assert.java:107)
>   at 
> org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55)
>   at 
> org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791)
>   at 
> org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675)
>   at 
> org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266)
>   at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
>   at 
> org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430)
>   at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108)
>   at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027)
>   at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701)
>   at 
> org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702)
>   at 
> org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427)
>   at 
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164)
>   at 
> 

[jira] [Created] (GEODE-6931) A failed RemotePutMessage can cause a PersistentReplicatesOfflineException to be thrown when no persistent members are offline

2019-07-01 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6931:


 Summary: A failed RemotePutMessage can cause a 
PersistentReplicatesOfflineException to be thrown when no persistent members 
are offline
 Key: GEODE-6931
 URL: https://issues.apache.org/jira/browse/GEODE-6931
 Project: Geode
  Issue Type: Bug
  Components: messaging
Reporter: Barry Oglesby


One of the places that RemotePutMessage is sent is DistributedRegion virtualPut.

Its sent from this method in this case:

- 2 wan sites
- the member in the receiving site that processes the batch defines the region 
as replicate proxy
- other receiving site members define the region as replicate persistent

DistributedRegion virtualPut is invoked by the GatewayReceiverCommand here:
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at 
org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:341)
at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
at 
org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5549)
at 
org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5200)
at 
org.apache.geode.internal.cache.tier.sockets.command.GatewayReceiverCommand.cmdExecute(GatewayReceiverCommand.java:429)
{noformat}
In this case, requiresOneHopForMissingEntry called by virtualPut returns true 
since a proxy region with other persistent replicates can't generate a version 
tag. This causes RemotePutMessage.distribute to be called.

If didDistribute returns false from RemotePutMessage.distribute (meaning the 
distribution failed), a PersistentReplicatesOfflineException is thrown 
regardless of the actual exception on the remote member:
{noformat}
if (!generateVersionTag && !didDistribute) {
  throw new PersistentReplicatesOfflineException();
}
{noformat}
One of the ways that didDistribute can be false is if both the remote wan site 
and local wan site are updating the same key at the same time. In that case a 
ConcurrentCacheModificationException can occur in the replicate persistent 
member (the one processing the RemotePutMessage).

This exception is not logged anywhere, and RemotePutMessage operateOnRegion 
doesn't know anything about it.

RemotePutMessage operateOnRegion running in the replicate persistent member 
calls:
{noformat}
result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
this.expectedOldValue,
this.requireOldValue, this.lastModified, true);
{noformat}
If putEntry returns false, it throws a RemoteOperationException which is sent 
back to the caller and causes didDistribute to be false. 
 
The result can be false in the RemotePutMessage operateOnRegion method because 
of a ConcurrentCacheModificationException:
{noformat}
org.apache.geode.internal.cache.versions.ConcurrentCacheModificationException: 
conflicting WAN event detected
at 
org.apache.geode.internal.cache.entries.AbstractRegionEntry.processGatewayTag(AbstractRegionEntry.java:1924)
at 
org.apache.geode.internal.cache.entries.AbstractRegionEntry.processVersionTag(AbstractRegionEntry.java:1443)
at 
org.apache.geode.internal.cache.entries.AbstractOplogDiskRegionEntry.processVersionTag(AbstractOplogDiskRegionEntry.java:165)
at 
org.apache.geode.internal.cache.entries.VersionedThinDiskLRURegionEntryHeapStringKey1.processVersionTag(VersionedThinDiskLRURegionEntryHeapStringKey1.java:378)
at 
org.apache.geode.internal.cache.AbstractRegionMap.processVersionTag(AbstractRegionMap.java:527)
at 
org.apache.geode.internal.cache.map.RegionMapPut.updateEntry(RegionMapPut.java:484)
at 
org.apache.geode.internal.cache.map.RegionMapPut.createOrUpdateEntry(RegionMapPut.java:256)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutAndDeliverEvent(AbstractRegionMapPut.java:300)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWithIndexUpdatingInProgress(AbstractRegionMapPut.java:308)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:296)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216)
at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198)
at 

[jira] [Updated] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFireE

2019-06-28 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6929:
-
Description: 
The RemotePutMessage operateOnRegion method attempts to sendReply twice if 
result is false:
{noformat}
try {
  result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
this.expectedOldValue,
  this.requireOldValue, this.lastModified, true);

  if (!this.result) { // make sure the region hasn't gone away
r.checkReadiness();
if (!this.ifNew && !this.ifOld) {
  // no reason to be throwing an exception, so let's retry
  RemoteOperationException ex = new RemoteOperationException(
  "unable to perform put, but operation should not fail");
1 ->  sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, 
startTime);
}
  }
...
if (sendReply) {
2->  sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event);
}
{noformat}
This causes this fatal InternalGemFireError:
{noformat}
[fatal 2019/06/28 15:33:01.005 PDT :41003 unshared ordered uid=12 dom #1 
port=58836> tid=0x4c] Uncaught exception processing 
tx.RemotePutMessage(regionPath=/TradeDateCalendar; 
sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; 
processorId=0; key=87; value=(5 bytes); 
callback=GatewaySenderEventCallbackArgument 
[originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}]; 
op=UPDATE; 
bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1;
 eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; 
ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; 
hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false)
org.apache.geode.InternalGemFireError: Trying to reply twice to a message
at org.apache.geode.internal.Assert.throwError(Assert.java:89)
at org.apache.geode.internal.Assert.assertTrue(Assert.java:107)
at 
org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55)
at 
org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791)
at 
org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675)
at 
org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266)
at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
at 
org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430)
at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108)
at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027)
at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407)
at 
org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701)
at 
org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702)
at 
org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427)
at 
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164)
at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2959)
at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1743)
at org.apache.geode.internal.tcp.Connection.run(Connection.java:1579)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

  was:
The RemotePutMessage operateOnRegion method attempts to sendReply twice if 
result is false:
```
try {
  result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
this.expectedOldValue,
  this.requireOldValue, this.lastModified, true);

  if (!this.result) { // make sure the region hasn't gone away
r.checkReadiness();
if (!this.ifNew && !this.ifOld) {
  // no reason to be throwing an exception, so let's retry
  RemoteOperationException ex = new 

[jira] [Assigned] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFire

2019-06-28 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6929:


Assignee: Barry Oglesby

> In the case of a ConcurrentCacheModificationException that occurs while 
> processing a RemotePutMessage, the reply is attempted to be sent twice 
> causing an InternalGemFireError
> --
>
> Key: GEODE-6929
> URL: https://issues.apache.org/jira/browse/GEODE-6929
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> The RemotePutMessage operateOnRegion method attempts to sendReply twice if 
> result is false:
> ```
> try {
>   result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
> this.expectedOldValue,
>   this.requireOldValue, this.lastModified, true);
>   if (!this.result) { // make sure the region hasn't gone away
> r.checkReadiness();
> if (!this.ifNew && !this.ifOld) {
>   // no reason to be throwing an exception, so let's retry
>   RemoteOperationException ex = new RemoteOperationException(
>   "unable to perform put, but operation should not fail");
> 1 ->  sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, 
> startTime);
> }
>   }
> ...
> if (sendReply) {
> 2->  sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event);
> }
> {noformat}
> This causes this fatal InternalGemFireError:
> {noformat}
> [fatal 2019/06/28 15:33:01.005 PDT  192.168.1.2(gateway-ny-proxy-1:77395):41003 unshared ordered uid=12 dom 
> #1 port=58836> tid=0x4c] Uncaught exception processing 
> tx.RemotePutMessage(regionPath=/TradeDateCalendar; 
> sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; 
> processorId=0; key=87; value=(5 bytes); 
> callback=GatewaySenderEventCallbackArgument 
> [originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}];
>  op=UPDATE; 
> bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1;
>  eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; 
> ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; 
> hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false)
> org.apache.geode.InternalGemFireError: Trying to reply twice to a message
>   at org.apache.geode.internal.Assert.throwError(Assert.java:89)
>   at org.apache.geode.internal.Assert.assertTrue(Assert.java:107)
>   at 
> org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55)
>   at 
> org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791)
>   at 
> org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675)
>   at 
> org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266)
>   at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
>   at 
> org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430)
>   at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108)
>   at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027)
>   at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701)
>   at 
> org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702)
>   at 
> org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427)
>   at 
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164)
>   at 
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2959)
>   at 
> 

[jira] [Created] (GEODE-6929) In the case of a ConcurrentCacheModificationException that occurs while processing a RemotePutMessage, the reply is attempted to be sent twice causing an InternalGemFireE

2019-06-28 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6929:


 Summary: In the case of a ConcurrentCacheModificationException 
that occurs while processing a RemotePutMessage, the reply is attempted to be 
sent twice causing an InternalGemFireError
 Key: GEODE-6929
 URL: https://issues.apache.org/jira/browse/GEODE-6929
 Project: Geode
  Issue Type: Bug
  Components: messaging
Reporter: Barry Oglesby


The RemotePutMessage operateOnRegion method attempts to sendReply twice if 
result is false:
```
try {
  result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
this.expectedOldValue,
  this.requireOldValue, this.lastModified, true);

  if (!this.result) { // make sure the region hasn't gone away
r.checkReadiness();
if (!this.ifNew && !this.ifOld) {
  // no reason to be throwing an exception, so let's retry
  RemoteOperationException ex = new RemoteOperationException(
  "unable to perform put, but operation should not fail");
1 ->  sendReply(getSender(), getProcessorId(), dm, new ReplyException(ex), r, 
startTime);
}
  }
...
if (sendReply) {
2->  sendReply(getSender(), getProcessorId(), dm, null, r, startTime, event);
}
{noformat}
This causes this fatal InternalGemFireError:
{noformat}
[fatal 2019/06/28 15:33:01.005 PDT :41003 unshared ordered uid=12 dom #1 
port=58836> tid=0x4c] Uncaught exception processing 
tx.RemotePutMessage(regionPath=/TradeDateCalendar; 
sender=192.168.1.2(gateway-ny-proxy-1:77395):41003; recipients=[null]; 
processorId=0; key=87; value=(5 bytes); 
callback=GatewaySenderEventCallbackArgument 
[originalCallbackArg=null;originatingSenderId=2;recipientGatewayReceivers={1}]; 
op=UPDATE; 
bridgeContext=identity(192.168.1.2(gateway-ln-data-1:77388):41002,connection=1;
 eventId=EventID[id=31 bytes;threadID=0x30001|1;sequenceID=731]; ifOld=false; 
ifNew=false; op=UPDATE; hadOldValue=false; deserializationPolicy=LAZY; 
hasDelta=false; sendDelta=false; isDeltaApplied=false ,distTx=false)
org.apache.geode.InternalGemFireError: Trying to reply twice to a message
at org.apache.geode.internal.Assert.throwError(Assert.java:89)
at org.apache.geode.internal.Assert.assertTrue(Assert.java:107)
at 
org.apache.geode.internal.tcp.DirectReplySender.putOutgoing(DirectReplySender.java:55)
at 
org.apache.geode.internal.cache.tx.RemotePutMessage$PutReplyMessage.send(RemotePutMessage.java:791)
at 
org.apache.geode.internal.cache.tx.RemotePutMessage.sendReply(RemotePutMessage.java:675)
at 
org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:266)
at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
at 
org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2891)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2571)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$1400(ClusterDistributionManager.java:110)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3430)
at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1108)
at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1027)
at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:407)
at 
org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:701)
at 
org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:702)
at 
org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3427)
at 
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3164)
at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2959)
at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1743)
at org.apache.geode.internal.tcp.Connection.run(Connection.java:1579)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet

2019-06-24 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871818#comment-16871818
 ] 

Barry Oglesby edited comment on GEODE-3718 at 6/24/19 9:52 PM:
---

The issue is that when a member is stopped, the shutdown hook attempts to stop 
the InternalResourceManager's scheduledExecutor. If that executor has any 
pending tasks (including ones in the future), the shutdown hook blocks waiting 
for them to fire.

If I run this test:

1. Start 3 servers defining a partitioned region with recovery-delay > 0
2. Load some data into the partitioned region
3. Kill one server using kill -9
4. The remaining servers schedule the recovery task
5. Stop the remaining servers normally (the JVMs do not stop)

The shutdown hook thread is waiting here for the InternalResourceManager's 
scheduledExecutor to terminate:
{noformat}
"Distributed system shutdown hook" #12 prio=5 os_prio=31 tid=0x7ff0f8c33000 
nid=0x12307 waiting on condition [0x7bc11000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x000766c00038> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
at 
org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:343)
at 
org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:156)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2079)
- locked <0x00075b58> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1516)
- locked <0x00075b58> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$6(InternalDistributedSystem.java:2181)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$8/111900554.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:745)
{noformat}
InternalResourceManager.stopExecutor waits up to 120 seconds to stop:
{noformat}
void stopExecutor(ExecutorService executor) {
  if (executor == null) {
return;
  }
  executor.shutdown();
  final int secToWait = Integer
  .getInteger(DistributionConfig.GEMFIRE_PREFIX + 
"prrecovery-close-timeout", 120).intValue();
  try {
executor.awaitTermination(secToWait, TimeUnit.SECONDS);
  } catch (InterruptedException x) {
Thread.currentThread().interrupt();
logger.debug("Failed in interrupting the Resource Manager Thread due to 
interrupt");
  }
  if (!executor.isTerminated()) {
logger.warn("Failed to stop resource manager threads in {} seconds",
secToWait);
  }
}
{noformat}
I added some logging that shows the sequence of events.

The logging shows the redundancy recovery task being scheduled when the first 
server is killed (steps 3 and 4 above):
{noformat}
[warn 2019/06/24 14:09:21.014 PDT  tid=0x15] 
PRHARedundancyProvider.scheduleRedundancyRecovery about to schedule task with 
delay=3
{noformat}
Then, the shutdown hook is invoked when the server is stopped which causes the 
InternalResourceManager to wait. The 1 task below is the redundancy recovery 
task:
{noformat}
[info 2019/06/24 14:09:25.161 PDT  tid=0xc] 
VM is exiting - shutting down distributed system

[info 2019/06/24 14:09:25.173 PDT  tid=0xc] 
GemFireCache[id = 1850680894; isClosing = true; isShutDownAll = false; created 
= Mon Jun 24 14:08:40 PDT 2019; server = false; copyOnRead = false; lockLease = 
120; lockTimeout = 60]: Now closing.

[warn 2019/06/24 14:09:25.174 PDT  tid=0xc] 
InternalResourceManager.stopExecutor waiting up to 120 seconds to terminate 
ScheduledThreadPoolExecutor containing 1 task
{noformat}
Then, the redundancy recovery task attempts to execute, but a 
CacheClosedException is thrown, so it doesn't do anything:
{noformat}
[warn 2019/06/24 14:09:51.016 PDT  tid=0xb2] 
PRHARedundancyProvider.run2 about to start recovery

[warn 2019/06/24 14:09:51.016 PDT  tid=0xb2] 
PRHARedundancyProvider.run2 caught:
org.apache.geode.cache.CacheClosedException: The cache is closed.
at 
org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:1482)
at 
org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83)
at 

[jira] [Commented] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet

2019-06-24 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871818#comment-16871818
 ] 

Barry Oglesby commented on GEODE-3718:
--

This code change is after the issue has already happened. We can do this, but 
it doesn't address the issue.

The issue is that when a member is stopped, the shutdown hook attempts to stop 
the InternalResourceManager's scheduledExecutor. If that executor has any 
pending tasks (including ones in the future), the shutdown hook blocks waiting 
for them to fire.

If I run this test:

1. Start 3 servers defining a partitioned region with recovery-delay > 0
2. Load some data into the partitioned region
3. Kill one server using kill -9
4. The remaining servers schedule the recovery task
5. Stop the remaining servers normally (the JVMs do not stop)

The shutdown hook thread is waiting here for the InternalResourceManager's 
scheduledExecutor to terminate:
{noformat}
"Distributed system shutdown hook" #12 prio=5 os_prio=31 tid=0x7ff0f8c33000 
nid=0x12307 waiting on condition [0x7bc11000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x000766c00038> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
at 
org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:343)
at 
org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:156)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2079)
- locked <0x00075b58> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1516)
- locked <0x00075b58> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$6(InternalDistributedSystem.java:2181)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$8/111900554.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:745)
{noformat}
InternalResourceManager.stopExecutor waits up to 120 seconds to stop:
{noformat}
void stopExecutor(ExecutorService executor) {
  if (executor == null) {
return;
  }
  executor.shutdown();
  final int secToWait = Integer
  .getInteger(DistributionConfig.GEMFIRE_PREFIX + 
"prrecovery-close-timeout", 120).intValue();
  try {
executor.awaitTermination(secToWait, TimeUnit.SECONDS);
  } catch (InterruptedException x) {
Thread.currentThread().interrupt();
logger.debug("Failed in interrupting the Resource Manager Thread due to 
interrupt");
  }
  if (!executor.isTerminated()) {
logger.warn("Failed to stop resource manager threads in {} seconds",
secToWait);
  }
}
{noformat}
I added some logging that shows the sequence of events.

The logging shows the redundancy recovery task being scheduled when the first 
server is killed (steps 3 and 4 above):
{noformat}
[warn 2019/06/24 14:09:21.014 PDT  tid=0x15] 
PRHARedundancyProvider.scheduleRedundancyRecovery about to schedule task with 
delay=3
{noformat}
Then, the shutdown hook is invoked when the server is stopped which causes the 
InternalResourceManager to wait. The 1 task below is the redundancy recovery 
task:
{noformat}
[info 2019/06/24 14:09:25.161 PDT  tid=0xc] 
VM is exiting - shutting down distributed system

[info 2019/06/24 14:09:25.173 PDT  tid=0xc] 
GemFireCache[id = 1850680894; isClosing = true; isShutDownAll = false; created 
= Mon Jun 24 14:08:40 PDT 2019; server = false; copyOnRead = false; lockLease = 
120; lockTimeout = 60]: Now closing.

[warn 2019/06/24 14:09:25.174 PDT  tid=0xc] 
InternalResourceManager.stopExecutor waiting up to 120 seconds to terminate 
ScheduledThreadPoolExecutor containing 1 task
{noformat}
Then, the redundancy recovery task attempts to execute, but a 
CacheClosedException is thrown, so it doesn't do anything:
{noformat}
[warn 2019/06/24 14:09:51.016 PDT  tid=0xb2] 
PRHARedundancyProvider.run2 about to start recovery

[warn 2019/06/24 14:09:51.016 PDT  tid=0xb2] 
PRHARedundancyProvider.run2 caught:
org.apache.geode.cache.CacheClosedException: The cache is closed.
at 
org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:1482)
at 

[jira] [Issue Comment Deleted] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet

2019-06-24 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-3718:
-
Comment: was deleted

(was: Lynn and I have a test that reproduces this issue.

3 servers configured like:

server 1 - replicate region
server 2 - replicate persistent region
server 3 - replicate persistent region

Server 1 is configured like:
{noformat}

 
 

{noformat}
Servers 2 and 3 are configured like:
{noformat}

 
 

{noformat}
This matches the customer's proxy and data groups members. Server1 is in the 
proxy group and servers 2 and 3 are in the data group.

Note: You must start the persistent servers first.

kill -9 one of the servers with a replicate persistent region.

When synchronization occurs (after maximumTimeBetweenPings - 6ms), a 
message like this will be logged in each member:

[info 2019/06/24 09:46:13.133 PDT  tid=0x2b] Region UTLatest is 
requesting synchronization with 192.168.1.2(server3:51729):41002 for 
192.168.1.2(server2:51722):41001

The member with the replicate region will also throw the ToDataException.

A couple work-arounds are:

- Instead of using a replicate region in server1, use a replicate proxy region
- Use replicate persistent regions in all members)

> The InternalResourceManager fails to shutdown if a redundancy recovery task 
> is scheduled but hasn't fired yet
> -
>
> Key: GEODE-3718
> URL: https://issues.apache.org/jira/browse/GEODE-3718
> Project: Geode
>  Issue Type: Bug
>  Components: core
>Reporter: Barry Oglesby
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available, recovery
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This happens with recovery-delay or startup-recovery-delay > 0.
> The thread gets stuck here:
> {noformat}
> "Thread-20" #133 prio=10 os_prio=31 tid=0x7fa85b886000 nid=0x890b waiting 
> on condition [0x70001269e000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007bc408900> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>   at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
>   at 
> org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:375)
>   at 
> org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:187)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2161)
>   - locked <0x0007bc0bc520> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1367)
>   - locked <0x0007bc0bc520> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1017)
>   at 
> org.apache.geode.management.internal.beans.MemberMBeanBridge$1.run(MemberMBeanBridge.java:986)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The InternalResourceManager is waiting for the termination of its 
> scheduledExecutor.
> The PRHARedundancyProvider initializes its recoveryExecutor using the 
> InternalResourceManager's scheduledExecutor:
> {noformat}
> recoveryExecutor = new OneTaskOnlyExecutor(resourceManager.getExecutor(),
>   new OneTaskOnlyExecutor.ConflatedTaskListener() {
> public void taskDropped() {
>   InternalResourceManager.getResourceObserver().recoveryConflated(region);
> }
>   });
> {noformat}
> The scheduleRedundancyRecovery method schedules a RecoveryRunnable if 
> necessary.
> If that task hasn't fired yet, the InternalResourceManager doesn't close, and 
> the JVM stays up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6901) If a region is replicate and replicate persistent in different members and a replicate persistent member crashes, the replicate members throw a ToDataException attempti

2019-06-24 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871517#comment-16871517
 ] 

Barry Oglesby commented on GEODE-6901:
--

Lynn and I have a test that reproduces this issue.

3 servers configured like:

server 1 - replicate region
server 2 - replicate persistent region
server 3 - replicate persistent region

Server 1 is configured like:
{noformat}

 
 

{noformat}
Servers 2 and 3 are configured like:
{noformat}

 
 

{noformat}
This matches the customer's proxy and data groups members. Server1 is in the 
proxy group and servers 2 and 3 are in the data group.

Note: You must start the persistent servers first.

kill -9 one of the servers with a replicate persistent region.

When synchronization occurs (after maximumTimeBetweenPings - 6ms), a 
message like this will be logged in each member:

[info 2019/06/24 09:46:13.133 PDT  tid=0x2b] Region UTLatest is 
requesting synchronization with 192.168.1.2(server3:51729):41002 for 
192.168.1.2(server2:51722):41001

The member with the replicate region will also throw the ToDataException.

A couple work-arounds are:

- Instead of using a replicate region in server1, use a replicate proxy region
- Use replicate persistent regions in all members

> If a region is replicate and replicate persistent in different members and a 
> replicate persistent member crashes, the replicate members throw a 
> ToDataException attempting to synchronize the region
> 
>
> Key: GEODE-6901
> URL: https://issues.apache.org/jira/browse/GEODE-6901
> Project: Geode
>  Issue Type: Bug
>  Components: persistence, regions
>Reporter: Barry Oglesby
>Priority: Major
>
> If a region is replicate and replicate persistent in different members and a 
> replicate persistent member crashes, the replicate members throw a 
> ToDataException attempting to synchronize the region
> In this case, an exception like this is thrown in the replicate member:
> {noformat}
> [warn 2019/06/21 17:06:33.516 PDT  tid=0x2b] Timer task 
>  encountered 
> exception
> org.apache.geode.ToDataException: class 
> org.apache.geode.internal.cache.versions.VMRegionVersionVector
>  at 
> org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2331)
>  at 
> org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492)
>  at 
> org.apache.geode.internal.InternalDataSerializer.basicWriteObject(InternalDataSerializer.java:2067)
>  at org.apache.geode.DataSerializer.writeObject(DataSerializer.java:2943)
>  at 
> org.apache.geode.internal.cache.InitialImageOperation$RequestImageMessage.toData(InitialImageOperation.java:2135)
>  at 
> org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2300)
>  at 
> org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492)
>  at 
> org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:242)
>  at 
> org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:385)
>  at 
> org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:241)
>  at 
> org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:596)
>  at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1711)
>  at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.send(GMSMembershipManager.java:1892)
>  at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2852)
>  at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:2779)
>  at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816)
>  at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1526)
>  at 
> org.apache.geode.internal.cache.InitialImageOperation.synchronizeWith(InitialImageOperation.java:649)
>  at 
> org.apache.geode.internal.cache.DistributedRegion.synchronizeWith(DistributedRegion.java:1321)
>  at 
> org.apache.geode.internal.cache.DistributedRegion.synchronizeForLostMember(DistributedRegion.java:1310)
>  at 
> org.apache.geode.internal.cache.DistributedRegion.performSynchronizeForLostMemberTask(DistributedRegion.java:1295)
>  at 
> org.apache.geode.internal.cache.DistributedRegion$1.run2(DistributedRegion.java:1285)
>  at 
> 

[jira] [Commented] (GEODE-3718) The InternalResourceManager fails to shutdown if a redundancy recovery task is scheduled but hasn't fired yet

2019-06-24 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871516#comment-16871516
 ] 

Barry Oglesby commented on GEODE-3718:
--

Lynn and I have a test that reproduces this issue.

3 servers configured like:

server 1 - replicate region
server 2 - replicate persistent region
server 3 - replicate persistent region

Server 1 is configured like:
```

 
 

```
Servers 2 and 3 are configured like:
```

 
 

```
This matches the customer's proxy and data groups members. Server1 is in the 
proxy group and servers 2 and 3 are in the data group.

Note: You must start the persistent servers first.

kill -9 one of the servers with a replicate persistent region.

When synchronization occurs (after maximumTimeBetweenPings - 6ms), a 
message like this will be logged in each member:

[info 2019/06/24 09:46:13.133 PDT  tid=0x2b] Region UTLatest is 
requesting synchronization with 192.168.1.2(server3:51729):41002 for 
192.168.1.2(server2:51722):41001

The member with the replicate region will also throw the ToDataException.

A couple work-arounds are:

- Instead of using a replicate region in server1, use a replicate proxy region
- Use replicate persistent regions in all members

> The InternalResourceManager fails to shutdown if a redundancy recovery task 
> is scheduled but hasn't fired yet
> -
>
> Key: GEODE-3718
> URL: https://issues.apache.org/jira/browse/GEODE-3718
> Project: Geode
>  Issue Type: Bug
>  Components: core
>Reporter: Barry Oglesby
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available, recovery
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This happens with recovery-delay or startup-recovery-delay > 0.
> The thread gets stuck here:
> {noformat}
> "Thread-20" #133 prio=10 os_prio=31 tid=0x7fa85b886000 nid=0x890b waiting 
> on condition [0x70001269e000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007bc408900> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>   at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
>   at 
> org.apache.geode.internal.cache.control.InternalResourceManager.stopExecutor(InternalResourceManager.java:375)
>   at 
> org.apache.geode.internal.cache.control.InternalResourceManager.close(InternalResourceManager.java:187)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2161)
>   - locked <0x0007bc0bc520> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1367)
>   - locked <0x0007bc0bc520> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1017)
>   at 
> org.apache.geode.management.internal.beans.MemberMBeanBridge$1.run(MemberMBeanBridge.java:986)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The InternalResourceManager is waiting for the termination of its 
> scheduledExecutor.
> The PRHARedundancyProvider initializes its recoveryExecutor using the 
> InternalResourceManager's scheduledExecutor:
> {noformat}
> recoveryExecutor = new OneTaskOnlyExecutor(resourceManager.getExecutor(),
>   new OneTaskOnlyExecutor.ConflatedTaskListener() {
> public void taskDropped() {
>   InternalResourceManager.getResourceObserver().recoveryConflated(region);
> }
>   });
> {noformat}
> The scheduleRedundancyRecovery method schedules a RecoveryRunnable if 
> necessary.
> If that task hasn't fired yet, the InternalResourceManager doesn't close, and 
> the JVM stays up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6901) If a region is replicate and replicate persistent in different members and a replicate persistent member crashes, the replicate members throw a ToDataException attempting

2019-06-21 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6901:


 Summary: If a region is replicate and replicate persistent in 
different members and a replicate persistent member crashes, the replicate 
members throw a ToDataException attempting to synchronize the region
 Key: GEODE-6901
 URL: https://issues.apache.org/jira/browse/GEODE-6901
 Project: Geode
  Issue Type: Bug
  Components: persistence, regions
Reporter: Barry Oglesby


If a region is replicate and replicate persistent in different members and a 
replicate persistent member crashes, the replicate members throw a 
ToDataException attempting to synchronize the region

In this case, an exception like this is thrown in the replicate member:
{noformat}
[warn 2019/06/21 17:06:33.516 PDT  tid=0x2b] Timer task 
 encountered 
exception
org.apache.geode.ToDataException: class 
org.apache.geode.internal.cache.versions.VMRegionVersionVector
 at 
org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2331)
 at 
org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492)
 at 
org.apache.geode.internal.InternalDataSerializer.basicWriteObject(InternalDataSerializer.java:2067)
 at org.apache.geode.DataSerializer.writeObject(DataSerializer.java:2943)
 at 
org.apache.geode.internal.cache.InitialImageOperation$RequestImageMessage.toData(InitialImageOperation.java:2135)
 at 
org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2300)
 at 
org.apache.geode.internal.InternalDataSerializer.writeDSFID(InternalDataSerializer.java:1492)
 at org.apache.geode.internal.tcp.MsgStreamer.writeMessage(MsgStreamer.java:242)
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:385)
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:241)
 at 
org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:596)
 at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1711)
 at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.send(GMSMembershipManager.java:1892)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2852)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:2779)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2816)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1526)
 at 
org.apache.geode.internal.cache.InitialImageOperation.synchronizeWith(InitialImageOperation.java:649)
 at 
org.apache.geode.internal.cache.DistributedRegion.synchronizeWith(DistributedRegion.java:1321)
 at 
org.apache.geode.internal.cache.DistributedRegion.synchronizeForLostMember(DistributedRegion.java:1310)
 at 
org.apache.geode.internal.cache.DistributedRegion.performSynchronizeForLostMemberTask(DistributedRegion.java:1295)
 at 
org.apache.geode.internal.cache.DistributedRegion$1.run2(DistributedRegion.java:1285)
 at 
org.apache.geode.internal.SystemTimer$SystemTimerTask.run(SystemTimer.java:445)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
Caused by: java.lang.ClassCastException: 
org.apache.geode.internal.cache.persistence.DiskStoreID cannot be cast to 
org.apache.geode.distributed.internal.membership.InternalDistributedMember
 at 
org.apache.geode.internal.cache.versions.VMRegionVersionVector.writeMember(VMRegionVersionVector.java:31)
 at 
org.apache.geode.internal.cache.versions.RegionVersionVector.toData(RegionVersionVector.java:1204)
 at 
org.apache.geode.internal.InternalDataSerializer.invokeToData(InternalDataSerializer.java:2300)
 ... 24 more
{noformat}
RegionVersionVector.java:1204 is here:
{noformat}
 for (Map.Entry> entry : 
this.memberToVersion.entrySet()) {
-> writeMember(entry.getKey(), out);
 InternalDataSerializer.invokeToData(entry.getValue(), out);
 }
{noformat}
VMRegionVersionVector expects the entries of the memberToVersion to be keyed by 
InternalDistributedMembers:
{noformat}
protected void writeMember(InternalDistributedMember member, DataOutput out) 
throws IOException {
{noformat}
Logging in RegionVersionVector.toData shows the RegionVersionVector in this 
member is a VMRegionVersionVector and its memberToVersion map contains 
DiskStoreIDs. This causes the ClassCastException.
{noformat}
This RegionVersionVector's (class=VMRegionVersionVector) memberToVersion map 
contains the following 1 entries:
 member=402d383b29fa4c31-8597a3b72674bf5d; class=DiskStoreID
{noformat}
The documentation 

[jira] [Resolved] (GEODE-6854) GatewaySender batch conflation can incorrectly conflate events causing out of order processing

2019-06-17 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6854.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> GatewaySender batch conflation can incorrectly conflate events causing out of 
> order processing
> --
>
> Key: GEODE-6854
> URL: https://issues.apache.org/jira/browse/GEODE-6854
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
> Fix For: 1.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> If a batch contains 2 equal update events, 
> {{AbstractGatewaySenderEventProcessor conflate}} will remove the original 
> event and add the later event at the end of the list. Depending on the other 
> events in the list, this could cause the batch to contain events that are out 
> of order.
> For example, in this batch containing 6 events before conflation, the last 
> two events are duplicates of earlier events:
> {noformat}
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> {noformat}
> Conflating this batch results in these 4 events:
> {noformat}
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> {noformat}
> Notice the shadowKeys and sequenceIds are out of order after the conflation.
> Conflation should produce this batch:
> {noformat}
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
> {noformat}
> This is similar to GEODE-4704, but not exactly the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

2019-06-13 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863350#comment-16863350
 ] 

Barry Oglesby commented on GEODE-6859:
--

Here is some additional logging showing the behavior:

The shadow PR for GatewaySender mysender is created:
{noformat}
[warn 2019/06/13 10:24:36.546 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender; userPR=/test
[warn 2019/06/13 10:24:36.546 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null
[warn 2019/06/13 10:24:36.597 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue 
senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; 
prQ=Partitioned Region @7951061f 
[path='/mysender_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=2; 
isDestroyed=false; isClosed=false; retryTimeout=360; serialNumber=125; 
partition 
attributes=PartitionAttributes@639507262[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null];
 on VM 192.168.1.2(server:4637):41001]
{noformat}
The shadow PR for GatewaySender mysender2 is created:
{noformat}
[warn 2019/06/13 10:24:43.064 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender2; userPR=/test
[warn 2019/06/13 10:24:43.064 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null
[warn 2019/06/13 10:24:43.069 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue 
senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; 
prQ=Partitioned Region @1c5b3979 
[path='/mysender2_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=3; 
isDestroyed=false; isClosed=false; retryTimeout=360; serialNumber=466; 
partition 
attributes=PartitionAttributes@635010394[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null];
 on VM 192.168.1.2(server:4637):41001]
{noformat}
GatewaySender mysender is destroyed:
{noformat}
[warn 2019/06/13 10:24:43.889 PDT  tid=0x3a] XXX 
AbstractGatewaySender.destroy region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
[warn 2019/06/13 10:24:43.889 PDT  tid=0x3a] XXX 
PartitionedRegion.destroyRegion region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
{noformat}
That causes PartitionedRegionDataStore.cleanUp to set shadowBucketDestroyed to 
true for all the buckets of the test region:
{noformat}
[warn 2019/06/13 10:24:43.890 PDT  tid=0x3a] XXX 
PartitionedRegionDataStore.cleanUp 
region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
[warn 2019/06/13 10:24:43.895 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=0; destroyed=true
[warn 2019/06/13 10:24:43.896 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=1; destroyed=true
[warn 2019/06/13 10:24:43.897 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=2; destroyed=true
[warn 2019/06/13 10:24:43.898 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=3; destroyed=true
[warn 2019/06/13 10:24:43.899 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=4; destroyed=true
[warn 2019/06/13 10:24:43.899 PDT  tid=0x3a] ...
[warn 2019/06/13 10:24:43.942 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=51; destroyed=true
[warn 2019/06/13 10:24:43.942 PDT  tid=0x3a] ...
[warn 2019/06/13 10:24:43.959 PDT  tid=0x3a] XXX 
PartitionedRegionDataStore.cleanUp complete 
region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
{noformat}
The put is delivered to the ParallelGatewaySenderQueue, but 
shadowBucketDestroyed is true from the cleanUp above so the put is dropped:
{noformat}
[warn 2019/06/13 10:24:44.011 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.put 
brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51
[warn 2019/06/13 10:24:44.012 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.put 
brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51; 
shadowBucketDestroyed=true
[warn 2019/06/13 10:24:44.012 PDT  tid=0x3a] XXX 
ParallelGatewaySenderQueue.put not putting entry into queue as shadowPR bucket 
is destroyed: key=164; value=GatewaySenderEventImpl[id=EventID[id=24 
bytes;threadID=0x1010033|1;sequenceID=122;bucketId=51];action=0;operation=CREATE;region=/test;key=3;value=3;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
 

[jira] [Commented] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

2019-06-12 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862571#comment-16862571
 ] 

Barry Oglesby commented on GEODE-6859:
--

As a work-around, the region can be altered to remove the original sender 
before it is destroyed:
{noformat}
 create gateway-sender --id=mysender --remote-distributed-system-id=2 
--enable-persistence --parallel
 create region --name=test --gateway-sender-id=mysender 
--type=PARTITION_PERSISTENT
 create gateway-sender --id=mysender2 --remote-distributed-system-id=4 
--enable-persistence --parallel
-> alter region --name=test --gateway-sender-id=''
 destroy gateway-sender --id=mysender
 alter region --name=test --gateway-sender-id=mysender2
 put --region=test --key="3" --value="3"
 list gateways
{noformat}
List gateway shows a queued event with this work-around:
{noformat}
GatewaySender Id | Member | Remote Cluster Id | Type | Status | Queued Events | 
Receiver Location
 | --- | - | 
 | --- | - | -
mysender2 | 192.168.1.2(server:34608):41001 | 4 | Parallel | Running | 1 | 
{noformat}

> Destroying a parallel gateway sender attached to a region causes other 
> senders attached to that same region to no longer queue events
> -
>
> Key: GEODE-6859
> URL: https://issues.apache.org/jira/browse/GEODE-6859
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Priority: Major
>
> This scenario causes the event to not be put into the queue:
>  - create gateway sender sender1
>  - create region attached to sender1
>  - create gateway sender sender2
>  - alter region to be attached to sender2
>  - destroy sender1
>  - put an entry into region
> Here are the steps using gfsh:
> {noformat}
> gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 
> --enable-persistence --parallel
> gfsh>create region --name=test --gateway-sender-id=mysender 
> --type=PARTITION_PERSISTENT
> gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 
> --enable-persistence --parallel
> gfsh>alter region --name=test --gateway-sender-id=mysender2
> gfsh>destroy gateway-sender --id=mysender
> gfsh>put --region=test --key="3" --value="3"
> {noformat}
> Debug logging shows:
> {noformat}
> [debug 2019/06/11 17:45:03.678 PDT  tid=0x3a] 
> ParallelGatewaySenderOrderedQueue not putting key 164 : Value : 
> GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...]
>  as shadowPR bucket is destroyed.
> {noformat}
> It comes down to this call in ParallelGatewaySenderQueue.put:
> {noformat}
> thisbucketDestroyed =
>  ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor()
>  .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed();
> {noformat}
> The first condition is true.
> Here is a stack that shows where shadowBucketDestroyed is set to true:
> {noformat}
> [warn 2019/06/12 16:32:47.066 PDT  tid=0x3a] 
> XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true
> java.lang.Exception
>  at 
> org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820)
>  at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630)
>  at 
> org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
>  at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299)
>  at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077)
>  at 
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453)
>  at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599)
>  at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555)
>  at 
> org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60)
>  at 
> 

[jira] [Updated] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

2019-06-12 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6859:
-
Description: 
This scenario causes the event to not be put into the queue:
 - create gateway sender sender1
 - create region attached to sender1
 - create gateway sender sender2
 - alter region to be attached to sender2
 - destroy sender1
 - put an entry into region

Here are the steps using gfsh:
{noformat}
gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 
--enable-persistence --parallel
gfsh>create region --name=test --gateway-sender-id=mysender 
--type=PARTITION_PERSISTENT
gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 
--enable-persistence --parallel
gfsh>alter region --name=test --gateway-sender-id=mysender2
gfsh>destroy gateway-sender --id=mysender
gfsh>put --region=test --key="3" --value="3"
{noformat}
Debug logging shows:
{noformat}
[debug 2019/06/11 17:45:03.678 PDT  tid=0x3a] 
ParallelGatewaySenderOrderedQueue not putting key 164 : Value : 
GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...]
 as shadowPR bucket is destroyed.
{noformat}
It comes down to this call in ParallelGatewaySenderQueue.put:
{noformat}
thisbucketDestroyed =
 ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor()
 .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed();
{noformat}
The first condition is true.

Here is a stack that shows where shadowBucketDestroyed is set to true:
{noformat}
[warn 2019/06/12 16:32:47.066 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed destroyed=true
java.lang.Exception
 at 
org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820)
 at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301)
 at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630)
 at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
 at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299)
 at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077)
 at 
org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555)
 at 
org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60)
 at 
org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193)
 at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
 at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
 at java.lang.Thread.run(Thread.java:745)
{noformat}
PartitionedRegionDataStore.cleanUp is doing this:
{noformat}
// Fix for defect #49012
if (buk instanceof AbstractBucketRegionQueue
 && buk.getPartitionedRegion().isShadowPR()) {
 if (buk.getPartitionedRegion().getColocatedWithRegion() != null) {
 buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor()
 .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true);
 }
}
{noformat}
The \{{buk.getPartitionedRegion().getColocatedWithRegion()}} is the data 
region. It can have more than one shadow region.

So, either this code has to check whether there are other shadow regions before 
making the call to setShadowBucketDestroyed or the BucketAdvisor 
shadowBucketDestroyed has to be maintained per shadow region rather than be a 
single boolean.

  was:
This scenario causes the event to not 

[jira] [Updated] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

2019-06-12 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6859:
-
Description: 
This scenario causes the event to not be put into the queue:
 - create gateway sender sender1
 - create region attached to sender1
 - create gateway sender sender2
 - alter region to be attached to sender2
 - destroy sender1
 - put an entry into region

Here are the steps using gfsh:
{noformat}
gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 
--enable-persistence --parallel
gfsh>create region --name=test --gateway-sender-id=mysender 
--type=PARTITION_PERSISTENT
gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 
--enable-persistence --parallel
gfsh>alter region --name=test --gateway-sender-id=mysender2
gfsh>destroy gateway-sender --id=mysender
gfsh>put --region=test --key="3" --value="3"
{noformat}
Debug logging shows:
{noformat}
[debug 2019/06/11 17:45:03.678 PDT  tid=0x3a] 
ParallelGatewaySenderOrderedQueue not putting key 164 : Value : 
GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...]
 as shadowPR bucket is destroyed.
{noformat}
It comes down to this call in ParallelGatewaySenderQueue.put:
{noformat}
thisbucketDestroyed =
 ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor()
 .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed();
{noformat}
The first condition is true.

Here is a stack that shows where shadowBucketDestroyed is set to true:
{noformat}
[warn 2019/06/12 16:32:47.066 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed destroyed=true
java.lang.Exception
 at 
org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820)
 at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301)
 at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630)
 at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
 at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299)
 at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077)
 at 
org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555)
 at 
org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60)
 at 
org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193)
 at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
 at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
 at java.lang.Thread.run(Thread.java:745)
{noformat}
PartitionedRegionDataStore.cleanUp is doing this:
{noformat}
// Fix for defect #49012
if (buk instanceof AbstractBucketRegionQueue
 && buk.getPartitionedRegion().isShadowPR()) {
 if (buk.getPartitionedRegion().getColocatedWithRegion() != null) {
 buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor()
 .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true);
 }
}
{noformat}
The {{buk.getPartitionedRegion().getColocatedWithRegion() }}is the data region. 
It can have more than one shadow region.

So, either this code has to check whether there are other shadow regions before 
making the call to setShadowBucketDestroyed or the BucketAdvisor 
shadowBucketDestroyed has to be maintained per shadow region rather than be a 
single boolean.

  was:
This scenario causes the event to not 

[jira] [Created] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

2019-06-12 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6859:


 Summary: Destroying a parallel gateway sender attached to a region 
causes other senders attached to that same region to no longer queue events
 Key: GEODE-6859
 URL: https://issues.apache.org/jira/browse/GEODE-6859
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barry Oglesby


This scenario causes the event to not be put into the queue:

- create gateway sender sender1
- create region attached to sender1
- create gateway sender sender2
- alter region to be attached to sender2
- destroy sender1
- put an entry into region

Here are the steps using gfsh:
{noformat}
gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 
--enable-persistence --parallel
gfsh>create region --name=test --gateway-sender-id=mysender 
--type=PARTITION_PERSISTENT
gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 
--enable-persistence --parallel
gfsh>alter region --name=test --gateway-sender-id=mysender2
gfsh>destroy gateway-sender --id=mysender
gfsh>put --region=test --key="3" --value="3"
{noformat}
Debug logging shows:
{noformat}
[debug 2019/06/11 17:45:03.678 PDT  tid=0x3a] 
ParallelGatewaySenderOrderedQueue not putting key 164 : Value : 
GatewaySenderEventImpl[id=EventID[192.168.1.2(server):41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...]
 as shadowPR bucket is destroyed.
{noformat}
It comes down to this call in ParallelGatewaySenderQueue.put:
{noformat}
thisbucketDestroyed =
 ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor()
 .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed();
{noformat}
The first condition is true.

Here is a stack that shows where shadowBucketDestroyed is set to true:
{noformat}
[warn 2019/06/12 16:32:47.066 PDT  tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed destroyed=true
java.lang.Exception
 at 
org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820)
 at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301)
 at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630)
 at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
 at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299)
 at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251)
 at 
org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077)
 at 
org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555)
 at 
org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60)
 at 
org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193)
 at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
 at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
 at java.lang.Thread.run(Thread.java:745)
{noformat}
PartitionedRegionDataStore.cleanUp is doing this:
{noformat}
// Fix for defect #49012
if (buk instanceof AbstractBucketRegionQueue
 && buk.getPartitionedRegion().isShadowPR()) {
 if (buk.getPartitionedRegion().getColocatedWithRegion() != null) {
 buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor()
 .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true);
 }
}
{noformat}
The \{{buk.getPartitionedRegion().getColocatedWithRegion()} is the data region. 
It can have more than one shadow region.

So, either this code has to check whether there are other 

[jira] [Assigned] (GEODE-6854) GatewaySender batch conflation can incorrectly conflate events causing out of order processing

2019-06-10 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6854:


Assignee: Barry Oglesby

> GatewaySender batch conflation can incorrectly conflate events causing out of 
> order processing
> --
>
> Key: GEODE-6854
> URL: https://issues.apache.org/jira/browse/GEODE-6854
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> If a batch contains 2 equal update events, 
> {{AbstractGatewaySenderEventProcessor conflate}} will remove the original 
> event and add the later event at the end of the list. Depending on the other 
> events in the list, this could cause the batch to contain events that are out 
> of order.
> For example, in this batch containing 6 events before conflation, the last 
> two events are duplicates of earlier events:
> {noformat}
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> {noformat}
> Conflating this batch results in these 4 events:
> {noformat}
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> {noformat}
> Notice the shadowKeys and sequenceIds are out of order after the conflation.
> Conflation should produce this batch:
> {noformat}
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
> SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
> SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
> {noformat}
> This is similar to GEODE-4704, but not exactly the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6854) GatewaySender batch conflation can incorrectly conflate events causing out of order processing

2019-06-10 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6854:


 Summary: GatewaySender batch conflation can incorrectly conflate 
events causing out of order processing
 Key: GEODE-6854
 URL: https://issues.apache.org/jira/browse/GEODE-6854
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barry Oglesby


If a batch contains 2 equal update events, 
{{AbstractGatewaySenderEventProcessor conflate}} will remove the original event 
and add the later event at the end of the list. Depending on the other events 
in the list, this could cause the batch to contain events that are out of order.

For example, in this batch containing 6 events before conflation, the last two 
events are duplicates of earlier events:
{noformat}
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
{noformat}
Conflating this batch results in these 4 events:
{noformat}
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
{noformat}
Notice the shadowKeys and sequenceIds are out of order after the conflation.

Conflation should produce this batch:
{noformat}
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=2;bucketId=89];action=1;operation=UPDATE;region=/dataStoreRegion;key=Object_6079;shadowKey=16587]
SenderEventImpl[id=EventID[threadID=0x10059|104;sequenceID=3;bucketId=89];action=2;operation=DESTROY;region=/dataStoreRegion;key=Object_6079;shadowKey=16700]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=9;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_7731;shadowKey=16813]
SenderEventImpl[id=EventID[threadID=0x10059|112;sequenceID=12;bucketId=89];action=1;operation=PUTALL_UPDATE;region=/dataStoreRegion;key=Object_6591;shadowKey=16926]
{noformat}
This is similar to GEODE-4704, but not exactly the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6616) Flaky: AutoConnectionSourceDUnitTest > testClientDynamicallyDropsStoppedLocator FAILED

2019-06-06 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858170#comment-16858170
 ] 

Barry Oglesby commented on GEODE-6616:
--

This issue happened again:

[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/781]

> Flaky: AutoConnectionSourceDUnitTest > 
> testClientDynamicallyDropsStoppedLocator FAILED
> --
>
> Key: GEODE-6616
> URL: https://issues.apache.org/jira/browse/GEODE-6616
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Mark Hanson
>Priority: Minor
>
> Failed connection..
> {noformat}
> [vm3] [info 2019/04/09 06:48:44.919 UTC  
> tid=0x20] Got result: EXCEPTION_OCCURRED
> [vm3] org.apache.geode.cache.client.ServerOperationException: remote server 
> on 16f27a14ad79(255:loner):52816:5f2bdb00: : While performing a remote put
> [vm3] at 
> org.apache.geode.cache.client.internal.PutOp$PutOpImpl.processAck(PutOp.java:389)
> [vm3] at 
> org.apache.geode.cache.client.internal.PutOp$PutOpImpl.processResponse(PutOp.java:313)
> [vm3] at 
> org.apache.geode.cache.client.internal.PutOp$PutOpImpl.attemptReadResponse(PutOp.java:454)
> [vm3] at 
> org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:387)
> [vm3] at 
> org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:289)
> [vm3] at 
> org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:351)
> [vm3] at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:908)
> [vm3] at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:172)
> [vm3] at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:130)
> [vm3] at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:792)
> [vm3] at 
> org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:90)
> [vm3] at 
> org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:155)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3070)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3222)
> [vm3] at 
> org.apache.geode.internal.cache.map.RegionMapPut.invokeCacheWriter(RegionMapPut.java:230)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:295)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPut(AbstractRegionMapPut.java:180)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWhileLockedForCacheModification(AbstractRegionMapPut.java:119)
> [vm3] at 
> org.apache.geode.internal.cache.map.RegionMapPut.runWhileLockedForCacheModification(RegionMapPut.java:150)
> [vm3] at 
> org.apache.geode.internal.cache.map.AbstractRegionMapPut.put(AbstractRegionMapPut.java:169)
> [vm3] at 
> org.apache.geode.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2044)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5695)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5123)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1652)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.lambda$put$3(LocalRegion.java:1638)
> [vm3] at 
> io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:57)
> [vm3] at 
> org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1634)
> [vm3] at 
> org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:425)
> [vm3]

[jira] [Resolved] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock

2019-06-06 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6821.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

> Multiple Serial GatewaySenders that are primary in different members can 
> cause a distributed deadlock
> -
>
> Key: GEODE-6821
> URL: https://issues.apache.org/jira/browse/GEODE-6821
> Project: Geode
>  Issue Type: Bug
>  Components: messaging, wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
> Fix For: 1.10.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A test with this scenario causes a distributed deadlock.
> 3 servers each with:
> - a function that performs a random region operation on the input region
> - a replicated region on which the function is executed
> - two regions each with a serial AEQ (the type of region could be either 
> replicate or partitioned)
> 1 multi-threaded client that repeatedly executes the function with random 
> region names and operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6836) CI Failure: ReconnectDUnitTest.testReconnectWithRoleLoss fails with GemFireConfigException

2019-06-05 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6836:


 Summary: CI Failure: ReconnectDUnitTest.testReconnectWithRoleLoss 
fails with GemFireConfigException
 Key: GEODE-6836
 URL: https://issues.apache.org/jira/browse/GEODE-6836
 Project: Geode
  Issue Type: Bug
  Components: membership
Reporter: Barry Oglesby


DistributedTestOpenJDK8 build 775:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/775

Failure:
{noformat}
org.apache.geode.cache30.ReconnectDUnitTest > testReconnectWithRoleLoss FAILED
 java.lang.RuntimeException: org.apache.geode.GemFireConfigException: Unable to 
join the distributed system. Operation either timed out, was stopped or Locator 
does not exist.
 at 
org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.finishCacheXml(JUnit4CacheTestCase.java:176)
 at 
org.apache.geode.cache30.ReconnectDUnitTest.postSetUp(ReconnectDUnitTest.java:174)

Caused by:
 org.apache.geode.GemFireConfigException: Unable to join the distributed 
system. Operation either timed out, was stopped or Locator does not exist.
 at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.join(GMSMembershipManager.java:663)
 at 
org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.joinDistributedSystem(GMSMembershipManager.java:743)
 at 
org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:176)
 at 
org.apache.geode.distributed.internal.membership.gms.GMSMemberFactory.newMembershipManager(GMSMemberFactory.java:106)
 at 
org.apache.geode.distributed.internal.membership.MemberFactory.newMembershipManager(MemberFactory.java:93)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:781)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:899)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:541)
 at 
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:756)
 at 
org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
 at 
org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:2997)
 at 
org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:251)
 at 
org.apache.geode.distributed.DistributedSystem.connect(DistributedSystem.java:158)
 at 
org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.getSystem(JUnit4DistributedTestCase.java:181)
 at 
org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.getSystem(JUnit4DistributedTestCase.java:257)
 at 
org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:118)
 at 
org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:104)
 at 
org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:100)
 at 
org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.finishCacheXml(JUnit4CacheTestCase.java:174)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6832) CI Failure: geode-assembly:test task failed with an EXCEPTION_ACCESS_VIOLATION

2019-06-05 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6832:


 Summary: CI Failure: geode-assembly:test task failed with an 
EXCEPTION_ACCESS_VIOLATION
 Key: GEODE-6832
 URL: https://issues.apache.org/jira/browse/GEODE-6832
 Project: Geode
  Issue Type: Bug
Reporter: Barry Oglesby


WindowsUnitTestOpenJDK11 build 552 Task :geode-assembly:test failed with an 
EXCEPTION_ACCESS_VIOLATION.

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsUnitTestOpenJDK11/builds/552

Failure:
{noformat}
> Task :geode-assembly:test
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x022a2862e835, pid=4916, 
tid=5752
#
# JRE version: OpenJDK Runtime Environment (11.0.3+7) (build 11.0.3+7)
# Java VM: OpenJDK 64-Bit Server VM (11.0.3+7, mixed mode, tiered, compressed 
oops, g1 gc, windows-amd64)
# Problematic frame:
# j 
net.bytebuddy.description.type.TypeList$Generic$ForDetachedTypes$OfTypeVariables.size()I+4
#
# Core dump will be written. Default location: 
C:\Users\geode\geode\geode-assembly\build\test\hs_err_pid4916.mdmp
#
# An error report file with more information is saved as:
# C:\Users\geode\geode\geode-assembly\build\test\hs_err_pid4916.log
Compiled method (c1) 11957 2157 3 
net.bytebuddy.description.type.TypeList$Generic$AbstractBase:: (5 bytes)
 total in heap [0x022a29176810,0x022a29176da8] = 1432
 relocation [0x022a29176988,0x022a291769d0] = 72
 main code [0x022a291769e0,0x022a29176c20] = 576
 stub code [0x022a29176c20,0x022a29176cb8] = 152
 oops [0x022a29176cb8,0x022a29176cc0] = 8
 metadata [0x022a29176cc0,0x022a29176cf0] = 48
 scopes data [0x022a29176cf0,0x022a29176d30] = 64
 scopes pcs [0x022a29176d30,0x022a29176da0] = 112
 dependencies [0x022a29176da0,0x022a29176da8] = 8
Could not load hsdis-amd64.dll; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
# https://github.com/AdoptOpenJDK/openjdk-build/issues
#
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by 
org.mockito.internal.util.reflection.AccessibilityChanger 
(file:/C:/Users/geode/.gradle/caches/modules-2/files-2.1/org.mockito/mockito-core/2.23.0/497ddb32fd5d01f9dbe99a2ec790aeb931dff1b1/mockito-core-2.23.0.jar)
 to field java.io.File.path
WARNING: Please consider reporting this to the maintainers of 
org.mockito.internal.util.reflection.AccessibilityChanger
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release

Unexpected exception thrown.
org.gradle.internal.remote.internal.MessageIOException: Could not write 
'/127.0.0.1:50173'.
 at 
org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:135)
 at 
org.gradle.internal.remote.internal.hub.MessageHub$ConnectionDispatch.run(MessageHub.java:325)
 at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
 at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: An existing connection was forcibly closed by 
the remote host
 at sun.nio.ch.SocketDispatcher.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
 at sun.nio.ch.IOUtil.write(IOUtil.java:51)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
 at 
org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeWithNonBlockingRetry(SocketConnection.java:273)
 at 
org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.writeBufferToChannel(SocketConnection.java:261)
 at 
org.gradle.internal.remote.internal.inet.SocketConnection$SocketOutputStream.flush(SocketConnection.java:255)
 at 
org.gradle.internal.remote.internal.inet.SocketConnection.flush(SocketConnection.java:133)
 ... 7 more

> Task :geode-assembly:test FAILED
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock

2019-05-30 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852457#comment-16852457
 ] 

Barry Oglesby commented on GEODE-6821:
--

The key to this deadlock is a shared P2P message reader waiting in 
SerialGatewaySenderQueue.put for a WriteLock like:
{noformat}
"P2P message reader for 192.168.1.2(server-3:54808):41005 shared ordered 
uid=6 port=62566" tid=0x4e owned by "Function Execution Processor7" tid=0x66
 java.lang.Thread.State: WAITING
 at sun.misc.Unsafe.park(Native Method)
 - waiting on 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@42117a4e
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
 at 
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.put(SerialGatewaySenderQueue.java:220)
 at 
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.queuePrimaryEvent(SerialGatewaySenderEventProcessor.java:477)
 at 
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.enqueueEvent(SerialGatewaySenderEventProcessor.java:445)
 at 
org.apache.geode.internal.cache.wan.AbstractGatewaySender.distribute(AbstractGatewaySender.java:1033)
 at 
org.apache.geode.internal.cache.LocalRegion.notifyGatewaySender(LocalRegion.java:6138)
 at 
org.apache.geode.internal.cache.LocalRegion.basicPutPart2(LocalRegion.java:5768)
 at 
org.apache.geode.internal.cache.map.RegionMapPut.doBeforeCompletionActions(RegionMapPut.java:282)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutAndDeliverEvent(AbstractRegionMapPut.java:301)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut$$Lambda$163/1504099933.run(Unknown
 Source)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWithIndexUpdatingInProgress(AbstractRegionMapPut.java:308)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:296)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273)
 - locked 
org.apache.geode.internal.cache.entries.VersionedThinRegionEntryHeapIntKey@7bbbc992
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251)
 - locked 
org.apache.geode.internal.cache.entries.VersionedThinRegionEntryHeapIntKey@7bbbc992
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut$$Lambda$162/754294637.run(Unknown
 Source)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPut(AbstractRegionMapPut.java:180)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut$$Lambda$161/453331027.run(Unknown
 Source)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWhileLockedForCacheModification(AbstractRegionMapPut.java:119)
 at 
org.apache.geode.internal.cache.map.RegionMapPut.runWhileLockedForCacheModification(RegionMapPut.java:161)
 at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.put(AbstractRegionMapPut.java:169)
 at 
org.apache.geode.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2044)
 at 
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5599)
 at 
org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:377)
 at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
 at 
org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5579)
 at 
org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:150)
 at 
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:285)
 at 
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:256)
 at 
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1200)
 at 
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1100)
 at 

[jira] [Assigned] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock

2019-05-30 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6821:


Assignee: Barry Oglesby

> Multiple Serial GatewaySenders that are primary in different members can 
> cause a distributed deadlock
> -
>
> Key: GEODE-6821
> URL: https://issues.apache.org/jira/browse/GEODE-6821
> Project: Geode
>  Issue Type: Bug
>  Components: messaging, wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> A test with this scenario causes a distributed deadlock.
> 3 servers each with:
> - a function that performs a random region operation on the input region
> - a replicated region on which the function is executed
> - two regions each with a serial AEQ (the type of region could be either 
> replicate or partitioned)
> 1 multi-threaded client that repeatedly executes the function with random 
> region names and operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6821) Multiple Serial GatewaySenders that are primary in different members can cause a distributed deadlock

2019-05-30 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6821:


 Summary: Multiple Serial GatewaySenders that are primary in 
different members can cause a distributed deadlock
 Key: GEODE-6821
 URL: https://issues.apache.org/jira/browse/GEODE-6821
 Project: Geode
  Issue Type: Bug
  Components: messaging, wan
Reporter: Barry Oglesby


A test with this scenario causes a distributed deadlock.

3 servers each with:
- a function that performs a random region operation on the input region
- a replicated region on which the function is executed
- two regions each with a serial AEQ (the type of region could be either 
replicate or partitioned)

1 multi-threaded client that repeatedly executes the function with random 
region names and operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GEODE-6748) Client invalidate operations never use single hop

2019-05-08 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6748:
-
Affects Version/s: 1.9.0

> Client invalidate operations never use single hop
> -
>
> Key: GEODE-6748
> URL: https://issues.apache.org/jira/browse/GEODE-6748
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.9.0
>Reporter: Barry Oglesby
>Priority: Major
>
> InvalidateOp.execute does:
> {noformat}
> public static void execute(ExecutablePool pool, String region, EntryEventImpl 
> event) {
>   AbstractOp op = new InvalidateOpImpl(region, event);
>   pool.execute(op);
> }{noformat}
> That is the non-single-hop way of executing an operation.
> It should use the single-hop way of executing an operation if 
> pr-single-hop-enabled=true.
> The execute methods in PutOp and GetOp show examples of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6748) Client invalidate operations never use single hop

2019-05-07 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6748:


 Summary: Client invalidate operations never use single hop
 Key: GEODE-6748
 URL: https://issues.apache.org/jira/browse/GEODE-6748
 Project: Geode
  Issue Type: Bug
  Components: client/server
Reporter: Barry Oglesby


InvalidateOp.execute does:
{noformat}
public static void execute(ExecutablePool pool, String region, EntryEventImpl 
event) {
  AbstractOp op = new InvalidateOpImpl(region, event);
  pool.execute(op);
}{noformat}
That is the non-single-hop way of executing an operation.

It should use the single-hop way of executing an operation if 
pr-single-hop-enabled=true.

The execute methods in PutOp and GetOp show examples of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-6186) Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch processing with conflation enabled

2019-04-30 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6186.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch 
> processing with conflation enabled
> 
>
> Key: GEODE-6186
> URL: https://issues.apache.org/jira/browse/GEODE-6186
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch 
> processing with conflation enabled
> This test:
> 3000 iterations of putAlls with the same 1500 keys into a partitioned region 
> attached to async-event-queue:
>  dispatcher-threads="1" parallel="true" enable-batch-conflation="true">
> Produces these numbers in the current code (4 different runs):
> {noformat}
> numBatches=645; numENFEs=8622196; totalPeekTime=178517; averagePeekTime=276; 
> totalProcessBatchTime=38936; averageProcessBatchTime=60
> numBatches=660; numENFEs=8467986; totalPeekTime=182985; averagePeekTime=277; 
> totalProcessBatchTime=34335; averageProcessBatchTime=52
> numBatches=646; numENFEs=8563364; totalPeekTime=179624; averagePeekTime=278; 
> totalProcessBatchTime=37342; averageProcessBatchTime=57
> numBatches=632; numENFEs=8716942; totalPeekTime=175570; averagePeekTime=277; 
> totalProcessBatchTime=39732; averageProcessBatchTime=62
> {noformat}
> After some changes mainly in BucketRegionQueue:
> {noformat}
> numBatches=782; numENFEs=3621039; totalPeekTime=195760; averagePeekTime=250; 
> totalProcessBatchTime=18724; averageProcessBatchTime=23
> numBatches=791; numENFEs=3604933; totalPeekTime=197980; averagePeekTime=250; 
> totalProcessBatchTime=18587; averageProcessBatchTime=23
> numBatches=790; numENFEs=3600038; totalPeekTime=197774; averagePeekTime=250; 
> totalProcessBatchTime=18611; averageProcessBatchTime=23
> numBatches=795; numENFEs=3584490; totalPeekTime=199060; averagePeekTime=250; 
> totalProcessBatchTime=18063; averageProcessBatchTime=22
> {noformat}
> numBatches is the number of batches peeked
> numENFEs is the number of EntryNotFoundExceptions thrown
> totalPeekTime is the total time to peek all batches
> averagePeekTime is the average time to peek a batch
> totalProcessBatchTime is the total time to process all batches
> averageProcessBatchTime is the average time to process a batch (includes 
> listener callback and remove from queue)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6572) MemberMBeanAttributesDUnitTest testConfigAttributes failed with SIGSEGV

2019-03-28 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6572:


 Summary: MemberMBeanAttributesDUnitTest testConfigAttributes 
failed with SIGSEGV
 Key: GEODE-6572
 URL: https://issues.apache.org/jira/browse/GEODE-6572
 Project: Geode
  Issue Type: Bug
  Components: jmx
Reporter: Barry Oglesby


The MemberMBeanAttributesDUnitTest testConfigAttributes test failed in 
DistributedTestOpenJDK11 CI run 547:

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/547

With this exception:
{noformat}
org.apache.geode.management.MemberMBeanAttributesDUnitTest > 
testConfigAttributes FAILED
 org.apache.geode.test.dunit.RMIException: While invoking 
org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties$$Lambda$161/0x000840231c40.run
 in VM 0 running on Host f32ed46bb850 with 4 VMs

Caused by:
 java.rmi.ConnectException: Connection refused to host: 172.17.0.13; nested 
exception is: 
 java.net.ConnectException: Connection refused (Connection refused)

Caused by:
 java.net.ConnectException: Connection refused (Connection refused)
{noformat}
The failure is actually a SIGSEGV which is logged in the 
MemberMBeanAttributesDUnitTest.html file:
{noformat}
[vm0] #
[vm0] # A fatal error has been detected by the Java Runtime Environment:
[vm0] #
[vm0] # SIGSEGV (0xb) at pc=0x7f38b7abe1a0, pid=153, tid=222
[vm0] #
[vm0] # JRE version: OpenJDK Runtime Environment (11.0.3+1) (build 
11.0.3+1-Debian-1)
[vm0] # Java VM: OpenJDK 64-Bit Server VM (11.0.3+1-Debian-1, mixed mode, 
sharing, tiered, compressed oops, g1 gc, linux-amd64)
[vm0] # Problematic frame:
[vm0] # V [libjvm.so+0x5271a0]
[vm0] #
[vm0] # Core dump will be written. Default location: 
/home/geode/geode/geode-core/build/distributedTest421/dunit/vm0/core
[vm0] #
[vm0] # An error report file with more information is saved as:
[vm0] # 
/home/geode/geode/geode-core/build/distributedTest421/dunit/vm0/hs_err_pid153.log
[vm0] #
[vm0] # Compiler replay data is saved as:
[vm0] # 
/home/geode/geode/geode-core/build/distributedTest421/dunit/vm0/replay_pid153.log
[vm0] #
[vm0] # If you would like to submit a bug report, please visit:
[vm0] # http://bugreport.java.com/bugreport/crash.jsp
[vm0] #
{noformat}
Unfortunately, I don't see the hs_err_pid153.log file in the tar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6566) CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed with suspect string

2019-03-27 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803131#comment-16803131
 ] 

Barry Oglesby commented on GEODE-6566:
--

Normally, SampleCollector.sample is only invoked by the StatSampler thread. It 
is the only one that iterates the resourceInstMap.

In this test, that thread is running normally. In addition, the test calls 
SampleCollector.sample. This now means two threads are executing that method 
and potentially iterating the resourceInstMap simultaneously.

This is the normal StatSampler thread:
{noformat}
[vm2] [warning 2019/03/27 10:35:14.150 PDT  tid=45] 
SampleCollector.sample invoked
[vm2] java.lang.Exception
[vm2] at 
org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:219)
[vm2] at 
org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232)
[vm2] at java.lang.Thread.run(Thread.java:745)
{noformat}
This is the test method thread:
{noformat}
[vm2] [warning 2019/03/27 10:35:15.285 PDT  
tid=19] SampleCollector.sample invoked
[vm2] java.lang.Exception
[vm2] at 
org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:219)
[vm2] at 
org.apache.geode.management.MemberMBeanAttributesDUnitTest.lambda$sampleStatistics$b6506259$1(MemberMBeanAttributesDUnitTest.java:89)
{noformat}

> CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed 
> with suspect string
> --
>
> Key: GEODE-6566
> URL: https://issues.apache.org/jira/browse/GEODE-6566
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Barry Oglesby
>Priority: Major
>
> CI run:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/543
> Suspect strings:
> {noformat}
> org.apache.geode.management.MemberMBeanAttributesDUnitTest > 
> testReplRegionAttributes FAILED
>  java.lang.AssertionError: Suspicious strings were written to the log during 
> this run.
>  Fix the strings or use IgnoredException.addIgnoredException to ignore.
>  ---
>  Found suspect string in log4j at line 2383
> [fatal 2019/03/27 01:09:04.965 UTC  tid=195] null
>  java.util.ConcurrentModificationException
>  at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
>  at java.util.HashMap$ValueIterator.next(HashMap.java:1471)
>  at 
> org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231)
>  at 
> org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232)
>  at java.lang.Thread.run(Thread.java:748)
> ---
>  Found suspect string in log4j at line 2396
> [fatal 2019/03/27 01:09:04.972 UTC  tid=195] Uncaught exception 
> in thread Thread[StatSampler,10,RMI Runtime]
>  java.util.ConcurrentModificationException
>  at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
>  at java.util.HashMap$ValueIterator.next(HashMap.java:1471)
>  at 
> org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231)
>  at 
> org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6566) CI failure: MemberMBeanAttributesDUnitTest testReplRegionAttributes failed with suspect string

2019-03-27 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6566:


 Summary: CI failure: MemberMBeanAttributesDUnitTest 
testReplRegionAttributes failed with suspect string
 Key: GEODE-6566
 URL: https://issues.apache.org/jira/browse/GEODE-6566
 Project: Geode
  Issue Type: Bug
  Components: tests
Reporter: Barry Oglesby


CI run:

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/543

Suspect strings:
{noformat}
org.apache.geode.management.MemberMBeanAttributesDUnitTest > 
testReplRegionAttributes FAILED
 java.lang.AssertionError: Suspicious strings were written to the log during 
this run.
 Fix the strings or use IgnoredException.addIgnoredException to ignore.
 ---
 Found suspect string in log4j at line 2383

[fatal 2019/03/27 01:09:04.965 UTC  tid=195] null
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
 at java.util.HashMap$ValueIterator.next(HashMap.java:1471)
 at 
org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231)
 at 
org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232)
 at java.lang.Thread.run(Thread.java:748)

---
 Found suspect string in log4j at line 2396

[fatal 2019/03/27 01:09:04.972 UTC  tid=195] Uncaught exception in 
thread Thread[StatSampler,10,RMI Runtime]
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
 at java.util.HashMap$ValueIterator.next(HashMap.java:1471)
 at 
org.apache.geode.internal.statistics.SampleCollector.sample(SampleCollector.java:231)
 at 
org.apache.geode.internal.statistics.HostStatSampler.run(HostStatSampler.java:232)
 at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6564) Clearing a replicated region with expiration causes a memory leak

2019-03-26 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6564:


 Summary: Clearing a replicated region with expiration causes a 
memory leak
 Key: GEODE-6564
 URL: https://issues.apache.org/jira/browse/GEODE-6564
 Project: Geode
  Issue Type: Bug
  Components: regions
Reporter: Barry Oglesby


Clearing a replicated region with expiration causes a memory leak

Both the RegionEntries and EntryExpiryTasks are still live after loading 
entries into the region and then clearing it.

Server Startup:
{noformat}
 num #instances #bytes class name
--
 1: 29856 2797840 [C
 4: 2038 520600 [B
Total 187711 10089624
{noformat}
Load 100 entries with 600k payload (representing a session):
{noformat}
 num #instances #bytes class name
--
 1: 2496 60666440 [B
 2: 30157 2828496 [C
 73: 100 7200 
org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1
 93: 100 4800 org.apache.geode.internal.cache.EntryExpiryTask
Total 190737 70240472
{noformat}
Clear region:
{noformat}
 num #instances #bytes class name
--
 1: 2398 60505944 [B
 2: 30448 2849456 [C
 74: 100 7200 
org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1
 100: 100 4800 org.apache.geode.internal.cache.EntryExpiryTask
Total 192199 70373048
{noformat}
Load and clear another 100 entries:
{noformat}
 num #instances #bytes class name
--
 1: 2503 120511688 [B
 2: 30506 2854384 [C
 46: 200 14400 
org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1
 61: 200 9600 org.apache.geode.internal.cache.EntryExpiryTask
Total 193272 130421432
{noformat}
Load and clear another 100 entries:
{noformat}
 num #instances #bytes class name
--
 1: 2600 180517240 [B
 2: 30562 2859584 [C
 33: 300 21600 
org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1
 47: 300 14400 org.apache.geode.internal.cache.EntryExpiryTask
Total 194310 190468176
{noformat}
A heap dump shows the VersionedStatsRegionEntryHeapStringKey1 instances are 
referenced by the DistributedRegion entryExpiryTasks:
{noformat}
--> org.apache.geode.internal.cache.DistributedRegion@0x76adbbb88 (816 bytes) 
(field entryExpiryTasks:)
--> java.util.concurrent.ConcurrentHashMap@0x76adbc028 (100 bytes) (field 
table:)
--> [Ljava.util.concurrent.ConcurrentHashMap$Node;@0x76ee85358 (4112 bytes) 
(Element 276 of [Ljava.util.concurrent.ConcurrentHashMap$Node;@0x76ee85358:)
--> java.util.concurrent.ConcurrentHashMap$Node@0x76edc4e20 (44 bytes) (field 
next:)
--> java.util.concurrent.ConcurrentHashMap$Node@0x76edc32f0 (44 bytes) (field 
key:)
--> 
org.apache.geode.internal.cache.entries.VersionedStatsRegionEntryHeapStringKey1@0x76edc3210
 (86 bytes) 
{noformat}
LocalRegion.cancelAllEntryExpiryTasks is called when the region is cleared:
{noformat}
java.lang.Exception: Stack trace
 at java.lang.Thread.dumpStack(Thread.java:1333)
 at 
org.apache.geode.internal.cache.LocalRegion.cancelAllEntryExpiryTasks(LocalRegion.java:8202)
 at 
org.apache.geode.internal.cache.LocalRegion.clearRegionLocally(LocalRegion.java:9094)
 at 
org.apache.geode.internal.cache.DistributedRegion.cmnClearRegion(DistributedRegion.java:1962)
 at 
org.apache.geode.internal.cache.LocalRegion.basicClear(LocalRegion.java:8998)
 at 
org.apache.geode.internal.cache.DistributedRegion.basicClear(DistributedRegion.java:1939)
 at 
org.apache.geode.internal.cache.LocalRegion.basicBridgeClear(LocalRegion.java:8988)
 at 
org.apache.geode.internal.cache.tier.sockets.command.ClearRegion.cmdExecute(ClearRegion.java:123)
{noformat}
But it doesn't clear the entryExpiryTasks map:
{noformat}
LocalRegion.clearRegionLocally before cancelAllEntryExpiryTasks 
entryExpiryTasks=100
LocalRegion.clearRegionLocally after cancelAllEntryExpiryTasks 
entryExpiryTasks=100
{noformat}
As a test, I added this call to the bottom of the cancelAllEntryExpiryTasks 
method:
{noformat}
this.entryExpiryTasks.clear();
{noformat}
This addressed the leak:
{noformat}
Server Startup: Total 182414 9855616
Load/Clear 1: Total 191049 10315832
Load/Clear 2: Total 191978 10329664
Load/Clear 3: Total 192638 10360360
{noformat}
As a work-around, a Function that clears the region by using removeAll on 
batches of keys also addresses the leak:
{noformat}
Server Startup: Total 182297 9849312
Load/Clear 1: Total 185932 10019248
Load/Clear 2: Total 191855 10278816
Load/Clear 3: Total 192511 10313168
Load/Clear 4: Total 193424 10352008
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6376) PersistentRecoveryOrderDUnitTest > testCrashDuringPreparePersistentId FAILED

2019-03-08 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788317#comment-16788317
 ] 

Barry Oglesby commented on GEODE-6376:
--

Re-occurred in CI:

[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/484]

{noformat}
org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest > 
testCrashDuringPreparePersistentId FAILED
 java.lang.RuntimeException: java.lang.IllegalStateException: Disk store 
PersistentRecoveryOrderDUnitTest_testCrashDuringPreparePersistentIdRegion not 
found
 at 
org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase._createPersistentRegion(PersistentReplicatedTestBase.java:194)
 at 
org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase.createPersistentRegion(PersistentReplicatedTestBase.java:180)
 at 
org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.testCrashDuringPreparePersistentId(PersistentRecoveryOrderDUnitTest.java:1325)

Caused by:
 java.lang.IllegalStateException: Disk store 
PersistentRecoveryOrderDUnitTest_testCrashDuringPreparePersistentIdRegion not 
found

8346 tests completed, 1 failed, 495 skipped
{noformat}
=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0034/test-results/distributedTest/1552078010/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0034/test-artifacts/1552078010/distributedtestfiles-OpenJDK11-1.10.0-SNAPSHOT.0034.tgz

> PersistentRecoveryOrderDUnitTest > testCrashDuringPreparePersistentId FAILED
> 
>
> Key: GEODE-6376
> URL: https://issues.apache.org/jira/browse/GEODE-6376
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Mark Hanson
>Priority: Major
>
> Failure Link
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/373
> Log Archives:
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0412/test-results/distributedTest/1549403523/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0412/test-artifacts/1549403523/distributedtestfiles-OpenJDK11-1.9.0-SNAPSHOT.0412.tgz
> Stack Trace:
> {code}
> java.lang.RuntimeException: java.lang.IllegalStateException: Disk store 
> PersistentRecoveryOrderDUnitTest_testCrashDuringPreparePersistentIdRegion not 
> found
>   at 
> org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase._createPersistentRegion(PersistentReplicatedTestBase.java:194)
>   at 
> org.apache.geode.internal.cache.persistence.PersistentReplicatedTestBase.createPersistentRegion(PersistentReplicatedTestBase.java:180)
>   at 
> org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.testCrashDuringPreparePersistentId(PersistentRecoveryOrderDUnitTest.java:1325)
>   at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at 

[jira] [Commented] (GEODE-4263) GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. testRMAndTimeoutSet

2019-03-08 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788271#comment-16788271
 ] 

Barry Oglesby commented on GEODE-4263:
--

ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout:

- creates a DefaultQuery test hook:
- asynchronously invokes the query (which calls doTestHook to start the test 
hook waiting)
- sleeps 1 second
- simulates a CRITICAL_HEAP_USED event
- sleeps another 4 seconds
- releases the test hook

Meanwhile, the test hook has been waiting for 8 seconds. The above steps didn't 
happen in 8 seconds, so the test failed with the 'query was never unlatched' 
message. Maybe 8 seconds isn't long enough to wait.

> GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. 
> testRMAndTimeoutSet
> ---
>
> Key: GEODE-4263
> URL: https://issues.apache.org/jira/browse/GEODE-4263
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Reporter: nabarun
>Priority: Major
>
> {noformat}
> java.lang.AssertionError: queryExecution.getResult() threw Exception 
> java.lang.AssertionError: An exception occurred during asynchronous 
> invocation.
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:738)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTest(ResourceManagerWithQueryMonitorDUnitTest.java:321)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testRMAndTimeoutSet(ResourceManagerWithQueryMonitorDUnitTest.java:157)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> 

[jira] [Reopened] (GEODE-4263) GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. testRMAndTimeoutSet

2019-03-08 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reopened GEODE-4263:
--

Reopening this JIRA since the 
ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout 
method failed in CI run:

https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/483
{noformat}
java.lang.AssertionError: queryExecution.getResult() threw Exception 
java.lang.AssertionError: An exception occurred during asynchronous invocation.
 at org.junit.Assert.fail(Assert.java:88)
 at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:847)
 at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTestWithMultipleServers(ResourceManagerWithQueryMonitorDUnitTest.java:628)
 at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testPRGatherCancellation(ResourceManagerWithQueryMonitorDUnitTest.java:249)

java.lang.AssertionError: Suspicious strings were written to the log during 
this run.
Fix the strings or use IgnoredException.addIgnoredException to ignore.
---
Found suspect string in log4j at line 2187

[fatal 2019/03/08 18:40:17.385 UTC  
tid=162] Server connection from 
[identity(172.17.0.5(283:loner):43932:82f99a5e,connection=1; port=43932] : 
Unexpected Error on server
java.lang.AssertionError: query was never unlatched
 at org.junit.Assert.fail(Assert.java:88)
 at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest$PauseTestHook.doTestHook(ResourceManagerWithQueryMonitorDUnitTest.java:1304)
 at 
org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:257)
 at 
org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:217)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommandQuery.processQueryUsingParams(BaseCommandQuery.java:105)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommandQuery.processQuery(BaseCommandQuery.java:58)
 at 
org.apache.geode.internal.cache.tier.sockets.command.Query.cmdExecute(Query.java:94)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:851)
 at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:75)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1227)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:616)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
 at java.lang.Thread.run(Thread.java:748)

8379 tests completed, 1 failed, 499 skipped
{noformat}
=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0033/test-results/distributedTest/1552074270/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0033/test-artifacts/1552074270/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0033.tgz

 

> GEODE-4263 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. 
> testRMAndTimeoutSet
> ---
>
> Key: GEODE-4263
> URL: https://issues.apache.org/jira/browse/GEODE-4263
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Reporter: nabarun
>Priority: Major
>
> {noformat}
> java.lang.AssertionError: queryExecution.getResult() threw Exception 
> java.lang.AssertionError: An exception occurred during asynchronous 
> invocation.
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:738)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTest(ResourceManagerWithQueryMonitorDUnitTest.java:321)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testRMAndTimeoutSet(ResourceManagerWithQueryMonitorDUnitTest.java:157)
>   at 

[jira] [Commented] (GEODE-6498) CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure

2019-03-08 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788073#comment-16788073
 ] 

Barry Oglesby commented on GEODE-6498:
--

This re-occurred in: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK11/builds/322]

{noformat}
org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest
 > describeClient FAILED
 org.junit.ComparisonFailure: 
expected:<"10.0.0.93([locator-0:1868:locator):41002 [Coordinator]]"> 
but was:<"10.0.0.93([server-1:7776):41004]">
 at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
 at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at 
org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152)
 at 
org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101)

10 tests completed, 1 failed
{noformat}
=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0027/test-results/distributedTest/1552027081/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0027/test-artifacts/1552027081/windows-gfshdistributedtest-OpenJDK11-1.10.0-SNAPSHOT.0027.tgz

> CI failure: DescribeClientCommandDUnitTest describeClient failed with 
> ComparisonFailure
> ---
>
> Key: GEODE-6498
> URL: https://issues.apache.org/jira/browse/GEODE-6498
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Assignee: Jinmei Liao
>Priority: Major
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/318
> {noformat}
> org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest
>  > describeClient FAILED
>  org.junit.ComparisonFailure: 
> expected:<"10.0.0.148(server-[2:8068):41004]"> but 
> was:<"10.0.0.148(server-[1:1656):41006]">
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at 
> org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152)
>  at 
> org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101)
> 10 tests completed, 1 failed
> {noformat}
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-results/distributedTest/1551991519/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-artifacts/1551991519/windows-gfshdistributedtest-OpenJDK8-1.10.0-SNAPSHOT.0024.tgz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GEODE-6498) CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure

2019-03-08 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6498:


Assignee: Jinmei Liao

> CI failure: DescribeClientCommandDUnitTest describeClient failed with 
> ComparisonFailure
> ---
>
> Key: GEODE-6498
> URL: https://issues.apache.org/jira/browse/GEODE-6498
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Assignee: Jinmei Liao
>Priority: Major
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/318
> {noformat}
> org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest
>  > describeClient FAILED
>  org.junit.ComparisonFailure: 
> expected:<"10.0.0.148(server-[2:8068):41004]"> but 
> was:<"10.0.0.148(server-[1:1656):41006]">
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at 
> org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152)
>  at 
> org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101)
> 10 tests completed, 1 failed
> {noformat}
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-results/distributedTest/1551991519/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-artifacts/1551991519/windows-gfshdistributedtest-OpenJDK8-1.10.0-SNAPSHOT.0024.tgz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6498) CI failure: DescribeClientCommandDUnitTest describeClient failed with ComparisonFailure

2019-03-08 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6498:


 Summary: CI failure: DescribeClientCommandDUnitTest describeClient 
failed with ComparisonFailure
 Key: GEODE-6498
 URL: https://issues.apache.org/jira/browse/GEODE-6498
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barry Oglesby


https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/318

{noformat}

org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest
 > describeClient FAILED
 org.junit.ComparisonFailure: 
expected:<"10.0.0.148(server-[2:8068):41004]"> but 
was:<"10.0.0.148(server-[1:1656):41006]">
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at 
org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.validateResults(DescribeClientCommandDUnitTest.java:152)
 at 
org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest.describeClient(DescribeClientCommandDUnitTest.java:101)

10 tests completed, 1 failed

{noformat}

=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-results/distributedTest/1551991519/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0024/test-artifacts/1551991519/windows-gfshdistributedtest-OpenJDK8-1.10.0-SNAPSHOT.0024.tgz



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6419) CI Failure: ClusterConfigurationDUnitTest.testStartServerAndExecuteCommands fails with BindException

2019-03-07 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787224#comment-16787224
 ] 

Barry Oglesby commented on GEODE-6419:
--

This failure reproduced during CI: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK11/builds/319
{noformat}
org.apache.geode.management.internal.cli.commands.ClusterConfigurationDUnitTest 
> testStartServerAndExecuteCommands[0] FAILED
 java.lang.AssertionError: Suspicious strings were written to the log during 
this run.
 Fix the strings or use IgnoredException.addIgnoredException to ignore.
 ---
 Found suspect string in log4j at line 1141

[error 2019/03/07 19:32:24.218 GMT  tid=64] 
Jmx manager could not be started because HTTP service failed to start
 org.apache.geode.management.ManagementException: HTTP service failed to start
 at 
org.apache.geode.management.internal.ManagementAgent.loadWebApplications(ManagementAgent.java:240)
 at 
org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:127)
 at 
org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432)
 at 
org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181)
 at 
org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:127)
 at 
org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2162)
 at 
org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:704)
 at 
org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1182)
 at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:181)
 at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:147)
 at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:138)
 at 
org.apache.geode.distributed.internal.InternalLocator.startCache(InternalLocator.java:672)
 at 
org.apache.geode.distributed.internal.InternalLocator.startDistributedSystem(InternalLocator.java:659)
 at 
org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:343)
 at org.apache.geode.distributed.Locator.startLocator(Locator.java:252)
 at org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:139)
 at 
org.apache.geode.test.junit.rules.LocatorStarterRule.startLocator(LocatorStarterRule.java:85)
 at 
org.apache.geode.test.junit.rules.LocatorStarterRule.before(LocatorStarterRule.java:66)
 at 
org.apache.geode.test.dunit.rules.ClusterStartupRule.lambda$startLocatorVM$22d9b8a8$1(ClusterStartupRule.java:239)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
 at 
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
 at 
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
 at java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:359)
 at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
 at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
 at java.base/java.security.AccessController.doPrivileged(Native Method)
 at java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196)
 at 
java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:562)
 at 
java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:796)
 at 
java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:677)
 at java.base/java.security.AccessController.doPrivileged(Native Method)
 at 
java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:676)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
 Caused by: java.net.BindException: Address already in use: bind
 at 

[jira] [Resolved] (GEODE-2968) Provide an API to set identity field(s) on JSON objects

2019-03-07 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-2968.
--
Resolution: Duplicate

> Provide an API to set identity field(s) on JSON objects
> ---
>
> Key: GEODE-2968
> URL: https://issues.apache.org/jira/browse/GEODE-2968
> Project: Geode
>  Issue Type: Improvement
>  Components: rest (dev), serialization
>Reporter: Barry Oglesby
>Priority: Major
>
> I have a JSON object with 53 fields. The identity of that object is one 
> specific field (the {{Unique_Key}} field in this case), but I can't specify 
> that when loading the object. This causes {{PdxInstanceImpl equals}} and 
> {{hashCode}} to use all 53 fields in their determinations and is especially 
> bad for OQL queries.
> I hacked {{PdxInstanceHelper addIntField}} to set an identity field like:
> {noformat}
> if (fieldName.equals("Unique_Key")) {
>   m_pdxInstanceFactory.markIdentityField(fieldName);
> }
> {noformat}
> Here are some queries before and after this change:
> Before:
> {noformat}
> Totals query=SELECT * FROM /data WHERE Agency = 'NYPD'; resultSize=1890; 
> iterations=1000; totalTime=30529 ms; averagePerQuery=30.529 ms
> Totals query=SELECT * FROM /data WHERE Incident_Address LIKE '%AVENUE%'; 
> resultSize=2930; iterations=1000; totalTime=62723 ms; averagePerQuery=62.723 
> ms
> Totals query=SELECT * FROM /data; resultSize=1; iterations=1000; 
> totalTime=87673 ms; averagePerQuery=87.673 ms
> {noformat}
> After:
> {noformat}
> Totals query=SELECT * FROM /data WHERE Agency = 'NYPD'; resultSize=1890; 
> iterations=1000; totalTime=12417 ms; averagePerQuery=12.417 ms
> Totals query=SELECT * FROM /data WHERE Incident_Address LIKE '%AVENUE%'; 
> resultSize=2930; iterations=1000; totalTime=29517 ms; averagePerQuery=29.517 
> ms
> Totals query=SELECT * FROM /data; resultSize=1; iterations=1000; 
> totalTime=44127 ms; averagePerQuery=44.127 ms
> {noformat}
> Here is an example of the JSON object:
> {noformat}
>  {
>"Unique_Key": 25419013,
>"Created_Date": "04/24/2013 12:00:00 AM",
>"Closed_Date": "04/25/2013 12:00:00 AM",
>"Agency": "HPD",
>"Agency_Name": "Department of Housing Preservation and Development",
>"Complaint_Type": "PLUMBING",
>"Descriptor": "WATER-SUPPLY",
>"Location_Type": "RESIDENTIAL BUILDING",
>"Incident_Zip": "11372",
>"Incident_Address": "37-37 88 STREET",
>"Street_Name": "88 STREET",
>"Cross_Street_1": "37 AVENUE",
>"Cross_Street_2": "ROOSEVELT AVENUE",
>"Intersection_Street_1": "",
>"Intersection_Street_2": "",
>"Address_Type": "ADDRESS",
>"City": "Jackson Heights",
>"Landmark": "",
>"Facility_Type": "N/A",
>"Status": "Closed",
>"Due_Date": "",
>"Resolution_Description": "The Department of Housing Preservation and 
> Development inspected the following conditions. No violations were issued. 
> The complaint has been closed.",
>"Resolution_Action_Updated_Date": "04/25/2013 12:00:00 AM",
>"Community_Board": "03 QUEENS",
>"Borough": "QUEENS",
>"X_Coordinate_State_Plane": 1017897,
>"Y_Coordinate_State_Plane": 212354,
>"Park_Facility_Name": "Unspecified",
>"Park_Borough": "QUEENS",
>"School_Name": "Unspecified",
>"School_Number": "Unspecified",
>"School_Region": "Unspecified",
>"School_Code": "Unspecified",
>"School_Phone_Number": "Unspecified",
>"School_Address": "Unspecified",
>"School_City": "Unspecified",
>"School_State": "Unspecified",
>"School_Zip": "Unspecified",
>"School_Not_Found": "",
>"School_or_Citywide_Complaint": "",
>"Vehicle_Type": "",
>"Taxi_Company_Borough": "",
>"Taxi_Pick_Up_Location": "",
>"Bridge_Highway_Name": "",
>"Bridge_Highway_Direction": "",
>"Road_Ramp": "",
>"Bridge_Highway_Segment": "",
>"Garage_Lot_Name": "",
>"Ferry_Direction": "",
>"Ferry_Terminal_Name": "",
>"Latitude": 40.74947521870806,
>"Longitude": -73.87856355000383,
>"Location": "(40.74947521870806, -73.87856355000383)"
>  }
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-4240) DeprecatedCacheServerLauncherIntegrationTest fails sporadically with execution timeout

2019-03-07 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786999#comment-16786999
 ] 

Barry Oglesby commented on GEODE-4240:
--

Reproduced in CI: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsIntegrationTestOpenJDK8/builds/323

org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest > 
testServerPortOneCacheServer FAILED
 java.lang.AssertionError: Timed out waiting for output "CacheServer pid: \\d+ 
status: running" after 12 ms. Output: 
 Starting CacheServer with pid: 0
 at org.junit.Assert.fail(Assert.java:88)
 at 
org.apache.geode.test.process.ProcessWrapper.waitForOutputToMatch(ProcessWrapper.java:240)
 at 
org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.execAndValidate(DeprecatedCacheServerLauncherIntegrationTest.java:438)
 at 
org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.testServerPortOneCacheServer(DeprecatedCacheServerLauncherIntegrationTest.java:334)

4564 tests completed, 1 failed, 95 skipped

=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0022/test-results/integrationTest/1551928468/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0022/test-artifacts/1551928468/windows-integrationtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0022.tgz

> DeprecatedCacheServerLauncherIntegrationTest fails sporadically with 
> execution timeout
> --
>
> Key: GEODE-4240
> URL: https://issues.apache.org/jira/browse/GEODE-4240
> Project: Geode
>  Issue Type: Bug
>Reporter: Patrick Rhomberg
>Assignee: Dan Smith
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While possibly unrelated, it is worth noting other recent failures due to 
> startup timeouts.  
> ([GEODE-4236](https://issues.apache.org/jira/browse/GEODE-4236) comes to 
> mind.)  
> I have recently seen a failure in this test timing out with the following 
> stacktrace:
> {noformat}
> java.lang.AssertionError: Timed out waiting for output "CacheServer pid: \d+ 
> status: running" after 12 ms. Output: 
> Starting CacheServer with pid: 0
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.geode.test.process.ProcessWrapper.waitForOutputToMatch(ProcessWrapper.java:222)
>   at 
> org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.execAndValidate(DeprecatedCacheServerLauncherIntegrationTest.java:437)
>   at 
> org.apache.geode.internal.cache.DeprecatedCacheServerLauncherIntegrationTest.testStartStatusStop(DeprecatedCacheServerLauncherIntegrationTest.java:164)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 

[jira] [Resolved] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails

2019-02-26 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6435.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders 
> assertion fails
> --
>
> Key: GEODE-6435
> URL: https://issues.apache.org/jira/browse/GEODE-6435
> Project: Geode
>  Issue Type: Bug
>Reporter: Patrick Rhomberg
>Assignee: Barry Oglesby
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: jdk11_OldGen_usedMemory_failure.gif, 
> jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {noformat}
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest 
> > testCreateMaximumSenders FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest$$Lambda$390/0x000840596840.run
>  in VM 2 running on Host 2e6d9f20266c with 8 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders(SerialGatewaySenderQueueDUnitTest.java:370)
> Caused by:
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in org.apache.geode.internal.cache.wan.WANTestBase that uses long 
> was not fulfilled within 300 seconds.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:79)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:27)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:860)
> at 
> org.apache.geode.internal.cache.wan.WANTestBase.verifyListenerEvents(WANTestBase.java:3542)
> at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.lambda$testCreateMaximumSenders$faf964a3$1(SerialGatewaySenderQueueDUnitTest.java:370)
> java.lang.AssertionError: An exception occurred during asynchronous 
> invocation.
> Caused by:
> java.rmi.ConnectIOException: error during JRMP connection 
> establishment; nested exception is: 
>   java.net.SocketTimeoutException: Read timed out
> Caused by:
> java.net.SocketTimeoutException: Read timed out
> {noformat}
>  
> See pipeline failure here:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/413]
> Find test results here:
> [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-results/distributedTest/1550631451/]
> Find artifacts here:
> [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-artifacts/1550631451/distributedtestfiles-OpenJDK11-1.9.0-SNAPSHOT.0456.tgz]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails

2019-02-21 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774543#comment-16774543
 ] 

Barry Oglesby edited comment on GEODE-6435 at 2/21/19 10:12 PM:


I wrote a simple test that just creates 500 connections to a server.

With no heap settings, I see pretty much the same heap usage as JDK 8.

With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m 
-XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap 
usage (which causes the OOME).

No heap settings:

Live histogram:

{noformat}
 num #instances #bytes class name (module)
 ---
 1: 33610 268237040 [B (java.base@11.0.2)
 2: 4441 1618264 [I (java.base@11.0.2)
 3: 30274 726576 java.lang.String (java.base@11.0.2)
 4: 5664 684952 java.lang.Class (java.base@11.0.2)
 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2)
 6: 618 368872 [J (java.base@11.0.2)
 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2)
 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2)
 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2)
 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2)
 Total 188802 276340528\{noformat}

vsd:

{noformat}
 currentMaxMemory=8,589,934,592
 currentUsedMemory=279,838,728\{noformat}

With heap settings:

Live histogram:

{noformat}
 num #instances #bytes class name (module)
 ---
 1: 30977 266997720 [B (java.base@11.0.2)
 2: 4934 262202008 [I (java.base@11.0.2)
 3: 28018 672432 java.lang.String (java.base@11.0.2)
 4: 5435 659304 java.lang.Class (java.base@11.0.2)
 5: 614 367784 [J (java.base@11.0.2)
 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2)
 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2)
 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2)
 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2)
 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2)
 Total 165373 534770560\{noformat}

vsd:

{noformat}
 currentMaxMemory=536,870,912
 currentUsedMemory=535,659,832\{noformat}


was (Author: barry.oglesby):
I wrote a simple test that just creates 500 connections to a server.

With no heap settings, I see pretty much the same heap usage as JDK 8.

With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m 
-XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap 
usage (which causes the OOME).

No heap settings:

Live histogram:
{noconfig}
 num #instances #bytes  class name (module)
---
   1: 33610  268237040  [B (java.base@11.0.2)
   2:  44411618264  [I (java.base@11.0.2)
   3: 30274 726576  java.lang.String (java.base@11.0.2)
   4:  5664 684952  java.lang.Class (java.base@11.0.2)
   5:  7556 417352  [Ljava.lang.Object; (java.base@11.0.2)
   6:   618 368872  [J (java.base@11.0.2)
   7: 10848 347136  java.util.concurrent.ConcurrentHashMap$Node 
(java.base@11.0.2)
   8: 10114 323648  java.util.HashMap$Node (java.base@11.0.2)
   9:  2130 220952  [Ljava.util.HashMap$Node; (java.base@11.0.2)
  10:  2373 208824  java.lang.reflect.Method (java.base@11.0.2)
Total188802  276340528
{noconfig}
vsd:
{noconfig}
currentMaxMemory=8,589,934,592
currentUsedMemory=279,838,728
{noconfig}
With heap settings:

Live histogram:
{noconfig}
 num #instances #bytes  class name (module)
---
   1: 30977  266997720  [B (java.base@11.0.2)
   2:  4934  262202008  [I (java.base@11.0.2)
   3: 28018 672432  java.lang.String (java.base@11.0.2)
   4:  5435 659304  java.lang.Class (java.base@11.0.2)
   5:   614 367784  [J (java.base@11.0.2)
   6: 11023 352736  java.util.concurrent.ConcurrentHashMap$Node 
(java.base@11.0.2)
   7:  4969 335792  [Ljava.lang.Object; (java.base@11.0.2)
   8:  9651 308832  java.util.HashMap$Node (java.base@11.0.2)
   9:  1728 180440  [Ljava.util.HashMap$Node; (java.base@11.0.2)
  10:  1464 128832  java.lang.reflect.Method (java.base@11.0.2)
Total165373  534770560
{noconfig}
vsd:
{noconfig}
currentMaxMemory=536,870,912
currentUsedMemory=535,659,832
{noconfig}


> CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders 
> assertion fails
> --
>
> Key: GEODE-6435
> URL: https://issues.apache.org/jira/browse/GEODE-6435
> Project: Geode
>  Issue Type: 

[jira] [Comment Edited] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails

2019-02-21 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774543#comment-16774543
 ] 

Barry Oglesby edited comment on GEODE-6435 at 2/21/19 10:13 PM:


I wrote a simple test that just creates 500 connections to a server.

With no heap settings, I see pretty much the same heap usage as JDK 8.

With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m 
-XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap 
usage (which causes the OOME).

No heap settings:

Live histogram:

{noformat}
 num #instances #bytes class name (module)
 ---
 1: 33610 268237040 [B (java.base@11.0.2)
 2: 4441 1618264 [I (java.base@11.0.2)
 3: 30274 726576 java.lang.String (java.base@11.0.2)
 4: 5664 684952 java.lang.Class (java.base@11.0.2)
 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2)
 6: 618 368872 [J (java.base@11.0.2)
 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2)
 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2)
 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2)
 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2)
 Total 188802 276340528
{noformat}
vsd:
{noformat}
 currentMaxMemory=8,589,934,592
 currentUsedMemory=279,838,728
{noformat}

With heap settings:

Live histogram:
{noformat}
 num #instances #bytes class name (module)
 ---
 1: 30977 266997720 [B (java.base@11.0.2)
 2: 4934 262202008 [I (java.base@11.0.2)
 3: 28018 672432 java.lang.String (java.base@11.0.2)
 4: 5435 659304 java.lang.Class (java.base@11.0.2)
 5: 614 367784 [J (java.base@11.0.2)
 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2)
 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2)
 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2)
 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2)
 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2)
 Total 165373 534770560
{noformat}
vsd:
{noformat}
 currentMaxMemory=536,870,912
 currentUsedMemory=535,659,832
{noformat}


was (Author: barry.oglesby):
I wrote a simple test that just creates 500 connections to a server.

With no heap settings, I see pretty much the same heap usage as JDK 8.

With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m 
-XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap 
usage (which causes the OOME).

No heap settings:

Live histogram:

{noformat}
 num #instances #bytes class name (module)
 ---
 1: 33610 268237040 [B (java.base@11.0.2)
 2: 4441 1618264 [I (java.base@11.0.2)
 3: 30274 726576 java.lang.String (java.base@11.0.2)
 4: 5664 684952 java.lang.Class (java.base@11.0.2)
 5: 7556 417352 [Ljava.lang.Object; (java.base@11.0.2)
 6: 618 368872 [J (java.base@11.0.2)
 7: 10848 347136 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2)
 8: 10114 323648 java.util.HashMap$Node (java.base@11.0.2)
 9: 2130 220952 [Ljava.util.HashMap$Node; (java.base@11.0.2)
 10: 2373 208824 java.lang.reflect.Method (java.base@11.0.2)
 Total 188802 276340528\{noformat}

vsd:

{noformat}
 currentMaxMemory=8,589,934,592
 currentUsedMemory=279,838,728\{noformat}

With heap settings:

Live histogram:

{noformat}
 num #instances #bytes class name (module)
 ---
 1: 30977 266997720 [B (java.base@11.0.2)
 2: 4934 262202008 [I (java.base@11.0.2)
 3: 28018 672432 java.lang.String (java.base@11.0.2)
 4: 5435 659304 java.lang.Class (java.base@11.0.2)
 5: 614 367784 [J (java.base@11.0.2)
 6: 11023 352736 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.2)
 7: 4969 335792 [Ljava.lang.Object; (java.base@11.0.2)
 8: 9651 308832 java.util.HashMap$Node (java.base@11.0.2)
 9: 1728 180440 [Ljava.util.HashMap$Node; (java.base@11.0.2)
 10: 1464 128832 java.lang.reflect.Method (java.base@11.0.2)
 Total 165373 534770560\{noformat}

vsd:

{noformat}
 currentMaxMemory=536,870,912
 currentUsedMemory=535,659,832\{noformat}

> CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders 
> assertion fails
> --
>
> Key: GEODE-6435
> URL: https://issues.apache.org/jira/browse/GEODE-6435
> Project: Geode
>  Issue Type: Bug
>Reporter: Patrick Rhomberg
>Assignee: Barry Oglesby
>Priority: Major
> Attachments: jdk11_OldGen_usedMemory_failure.gif, 
> jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif
>
>
> {noformat}
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest 
> > testCreateMaximumSenders FAILED
> org.apache.geode.test.dunit.RMIException: 

[jira] [Commented] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails

2019-02-21 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774543#comment-16774543
 ] 

Barry Oglesby commented on GEODE-6435:
--

I wrote a simple test that just creates 500 connections to a server.

With no heap settings, I see pretty much the same heap usage as JDK 8.

With these heap settings (which are what dunit jvms use): {{-Xms512m -Xmx512m 
-XX:MetaspaceSize=512m -XX:SoftRefLRUPolicyMSPerMB=1}}, I see double the heap 
usage (which causes the OOME).

No heap settings:

Live histogram:
{noconfig}
 num #instances #bytes  class name (module)
---
   1: 33610  268237040  [B (java.base@11.0.2)
   2:  44411618264  [I (java.base@11.0.2)
   3: 30274 726576  java.lang.String (java.base@11.0.2)
   4:  5664 684952  java.lang.Class (java.base@11.0.2)
   5:  7556 417352  [Ljava.lang.Object; (java.base@11.0.2)
   6:   618 368872  [J (java.base@11.0.2)
   7: 10848 347136  java.util.concurrent.ConcurrentHashMap$Node 
(java.base@11.0.2)
   8: 10114 323648  java.util.HashMap$Node (java.base@11.0.2)
   9:  2130 220952  [Ljava.util.HashMap$Node; (java.base@11.0.2)
  10:  2373 208824  java.lang.reflect.Method (java.base@11.0.2)
Total188802  276340528
{noconfig}
vsd:
{noconfig}
currentMaxMemory=8,589,934,592
currentUsedMemory=279,838,728
{noconfig}
With heap settings:

Live histogram:
{noconfig}
 num #instances #bytes  class name (module)
---
   1: 30977  266997720  [B (java.base@11.0.2)
   2:  4934  262202008  [I (java.base@11.0.2)
   3: 28018 672432  java.lang.String (java.base@11.0.2)
   4:  5435 659304  java.lang.Class (java.base@11.0.2)
   5:   614 367784  [J (java.base@11.0.2)
   6: 11023 352736  java.util.concurrent.ConcurrentHashMap$Node 
(java.base@11.0.2)
   7:  4969 335792  [Ljava.lang.Object; (java.base@11.0.2)
   8:  9651 308832  java.util.HashMap$Node (java.base@11.0.2)
   9:  1728 180440  [Ljava.util.HashMap$Node; (java.base@11.0.2)
  10:  1464 128832  java.lang.reflect.Method (java.base@11.0.2)
Total165373  534770560
{noconfig}
vsd:
{noconfig}
currentMaxMemory=536,870,912
currentUsedMemory=535,659,832
{noconfig}


> CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders 
> assertion fails
> --
>
> Key: GEODE-6435
> URL: https://issues.apache.org/jira/browse/GEODE-6435
> Project: Geode
>  Issue Type: Bug
>Reporter: Patrick Rhomberg
>Assignee: Barry Oglesby
>Priority: Major
> Attachments: jdk11_OldGen_usedMemory_failure.gif, 
> jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif
>
>
> {noformat}
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest 
> > testCreateMaximumSenders FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest$$Lambda$390/0x000840596840.run
>  in VM 2 running on Host 2e6d9f20266c with 8 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders(SerialGatewaySenderQueueDUnitTest.java:370)
> Caused by:
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in org.apache.geode.internal.cache.wan.WANTestBase that uses long 
> was not fulfilled within 300 seconds.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:79)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:27)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:860)
> at 
> org.apache.geode.internal.cache.wan.WANTestBase.verifyListenerEvents(WANTestBase.java:3542)
> at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.lambda$testCreateMaximumSenders$faf964a3$1(SerialGatewaySenderQueueDUnitTest.java:370)
> java.lang.AssertionError: An exception occurred during asynchronous 
> invocation.
> Caused by:
> java.rmi.ConnectIOException: error 

[jira] [Commented] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails

2019-02-20 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773584#comment-16773584
 ] 

Barry Oglesby commented on GEODE-6435:
--

Several threads in vm4 are throwing OutOfMemoryErrors in this test.
{noformat}
[vm4] java.lang.OutOfMemoryError: Java heap space
[vm4] Dumping heap to java_pid367.hprof ...
[vm4] Heap dump file created [558837052 bytes in 1.626 secs]
[vm4] [fatal 2019/02/20 01:51:58.219 UTC  tid=0x8ba] Fatal error from asynchronous flusher thread
[vm4] java.lang.OutOfMemoryError: Java heap space
[vm4] [error 2019/02/20 01:51:58.232 UTC  
tid=0x6ac] JGRP000190: failed receiving packet
[vm4] java.lang.OutOfMemoryError: Java heap space
[vm4] [fatal 2019/02/20 01:51:59.303 UTC  tid=0x87a] Fatal error from asynchronous flusher thread
[vm4] java.lang.OutOfMemoryError: Java heap space
[vm4] [fatal 2019/02/20 01:51:59.303 UTC  tid=0x760] Fatal error from asynchronous flusher thread
[vm4] java.lang.OutOfMemoryError: Java heap space
[vm4] [fatal 2019/02/20 01:51:59.303 UTC  tid=0x882] Fatal error from asynchronous flusher thread
[vm4] java.lang.OutOfMemoryError: Java heap space
[vm4] [fatal 2019/02/20 01:51:59.304 UTC  tid=0xa23] Fatal error from asynchronous flusher thread
[vm4] java.lang.OutOfMemoryError: Java heap space
{noformat}
When I run the test on my local machine, it is successful. The difference is 
that my test runs in ~7 seconds. The failed test took ~10 seconds. This is 
enough time for the PingTask to run. This creates an additional connection (so 
it doubles the connections).

A stack trace shows the PingTask creating a Connection:
{noformat}
[vm4] java.lang.Exception: Stack trace
[vm4]   at java.base/java.lang.Thread.dumpStack(Thread.java:1387)
[vm4]   at 
org.apache.geode.cache.client.internal.ConnectionImpl.connect(ConnectionImpl.java:112)
[vm4]   at 
org.apache.geode.cache.client.internal.ConnectionConnector.connectClientToServer(ConnectionConnector.java:75)
[vm4]   at 
org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:111)
[vm4]   at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:321)
[vm4]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:395)
[vm4]   at 
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:362)
[vm4]   at 
org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:835)
[vm4]   at org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:36)
[vm4]   at 
org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90)
[vm4]   at 
org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1371)
[vm4]   at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[vm4]   at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
[vm4]   at 
org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:276)
[vm4]   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[vm4]   at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[vm4]   at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}
In the failed test, there are log messages from the poolTimer threads like 
below. These are coming from the PingTask threads.
{noformat}
[vm4] [info 2019/02/20 01:51:02.547 UTC  tid=0xaa0] Socket 
receive buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.547 UTC  tid=0xaa0] Socket 
send buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.607 UTC  tid=0xaa1] Socket 
receive buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.608 UTC  tid=0xaa1] Socket 
send buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.657 UTC  tid=0xaa2] Socket 
receive buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.657 UTC  tid=0xaa2] Socket 
send buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.727 UTC  tid=0xaa4] Socket 
receive buffer size is 212992 instead of the requested 524288.
[vm4] [info 2019/02/20 01:51:02.727 UTC  tid=0xaa4] Socket 
send buffer size is 212992 instead of the requested 524288.
...
{noformat}
If I add a sleep in the test to mimic the CI test run, my test throws the same 
OutOfMemoryErrors.

This test does not behave the same way in JDK 8. It uses much less memory.

I attached a few charts showing the memory usage in both JDK 8 and JDK 11 with 
successful and failed tests.


> CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders 
> assertion fails
> 

[jira] [Updated] (GEODE-6435) CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders assertion fails

2019-02-20 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6435:
-
Attachment: jdk11_OldGen_usedMemory_success.gif
jdk11_OldGen_usedMemory_failure.gif
jdk8_OldGen_usedMemory_success.gif

> CI Failure: SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders 
> assertion fails
> --
>
> Key: GEODE-6435
> URL: https://issues.apache.org/jira/browse/GEODE-6435
> Project: Geode
>  Issue Type: Bug
>Reporter: Patrick Rhomberg
>Assignee: Barry Oglesby
>Priority: Major
> Attachments: jdk11_OldGen_usedMemory_failure.gif, 
> jdk11_OldGen_usedMemory_success.gif, jdk8_OldGen_usedMemory_success.gif
>
>
> {noformat}
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest 
> > testCreateMaximumSenders FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest$$Lambda$390/0x000840596840.run
>  in VM 2 running on Host 2e6d9f20266c with 8 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.testCreateMaximumSenders(SerialGatewaySenderQueueDUnitTest.java:370)
> Caused by:
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in org.apache.geode.internal.cache.wan.WANTestBase that uses long 
> was not fulfilled within 300 seconds.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:145)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:79)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:27)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:902)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:860)
> at 
> org.apache.geode.internal.cache.wan.WANTestBase.verifyListenerEvents(WANTestBase.java:3542)
> at 
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueueDUnitTest.lambda$testCreateMaximumSenders$faf964a3$1(SerialGatewaySenderQueueDUnitTest.java:370)
> java.lang.AssertionError: An exception occurred during asynchronous 
> invocation.
> Caused by:
> java.rmi.ConnectIOException: error during JRMP connection 
> establishment; nested exception is: 
>   java.net.SocketTimeoutException: Read timed out
> Caused by:
> java.net.SocketTimeoutException: Read timed out
> {noformat}
>  
> See pipeline failure here:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/413]
> Find test results here:
> [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-results/distributedTest/1550631451/]
> Find artifacts here:
> [http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0456/test-artifacts/1550631451/distributedtestfiles-OpenJDK11-1.9.0-SNAPSHOT.0456.tgz]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6356) CI failure: PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer fails with suspect string

2019-02-04 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760238#comment-16760238
 ] 

Barry Oglesby commented on GEODE-6356:
--

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0401/test-results/distributedTest/1549307890/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-SNAPSHOT.0401/test-artifacts/1549307890/distributedtestfiles-OpenJDK8-1.9.0-SNAPSHOT.0401.tgz

> CI failure: 
> PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer
>  fails with suspect string
> 
>
> Key: GEODE-6356
> URL: https://issues.apache.org/jira/browse/GEODE-6356
> Project: Geode
>  Issue Type: Bug
>  Components: persistence, tests
>Reporter: Barry Oglesby
>Priority: Major
>
> PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer
>  failed in DistributedTestOpenJDK8 CI run 
> [361|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/361]
>  with this suspect string:
> {noformat}
> org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest
>  > doesNotWaitForPreviousInstanceOfOnlineServer FAILED
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in log4j at line 2052
> [error 2019/02/04 18:49:32.561 UTC  
> tid=32] org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> This connection to a distributed system has been disconnected.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6356) CI failure: PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer fails with suspect string

2019-02-04 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6356:


 Summary: CI failure: 
PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer
 fails with suspect string
 Key: GEODE-6356
 URL: https://issues.apache.org/jira/browse/GEODE-6356
 Project: Geode
  Issue Type: Bug
  Components: persistence, tests
Reporter: Barry Oglesby


PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer
 failed in DistributedTestOpenJDK8 CI run 
[361|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/361]
 with this suspect string:
{noformat}
org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest
 > doesNotWaitForPreviousInstanceOfOnlineServer FAILED
java.lang.AssertionError: Suspicious strings were written to the log during 
this run.
Fix the strings or use IgnoredException.addIgnoredException to ignore.
---
Found suspect string in log4j at line 2052

[error 2019/02/04 18:49:32.561 UTC  
tid=32] org.apache.geode.distributed.DistributedSystemDisconnectedException: 
This connection to a distributed system has been disconnected.
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-30 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6287.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> When a client connects, registers interest and disconnects normally, its 
> ClientProxyMembershipID is not cleaned up and a memory leak occurs
> ---
>
> Key: GEODE-6287
> URL: https://issues.apache.org/jira/browse/GEODE-6287
> Project: Geode
>  Issue Type: Bug
>  Components: client queues, client/server
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a client connects to a distributed system and registers interest, the 
> Region's FilterProfile's clientMap (an IDMap) registers the 
> ClientProxyMembershipID in both the realIDs and wireIDs like:
> {noformat}
> realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
> wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
> {noformat}
> When the client leaves normally, the UnregisterInterest command is invoked 
> which removes the interest and the region. Part of that behavior is to remove 
> the regionName from the set of regions.
> {noformat}
> this.regions.remove(regionName)
> {noformat}
> Then ClientInterestList.clearClientInterestList is then invoked which is 
> supposed to clear the FilterProfile for each region, but the regions are 
> already cleared by the UnregisterInterest command, so this method doesn't do 
> anything.
> Then, LocalRegion.cleanupForClient is invoked which invokes 
> FilterProfile.cleanupForClient. This method currently only closes CQs (which 
> also cleans up the cqMap which is also an IDMap like the clientMap).
> At the end of this, the clientMap's realIDs and wireIDs still contain the 
> ClientProxyMembershipID.
> The cleanupForClient method could be changed to also clean up the clientMap.
> Note: If the client is killed abnormally, the UnregisterInterest command is 
> not invoked, so the interest and the region is not cleaned up normally. When 
> ClientInterestList.clearClientInterestList is called, the set of regions 
> still contains the region, and the IDMap is cleaned up properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6327) There needs to be a way to specify identity fields for JSON documents converted to PDX

2019-01-28 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754280#comment-16754280
 ] 

Barry Oglesby commented on GEODE-6327:
--

The attached implementation was done for a POC. In this case, it reduced the 
number of fields used in hashCode and equals from 98 to 1 and reduced the query 
time from 30 ms to 12 ms.

> There needs to be a way to specify identity fields for JSON documents 
> converted to PDX
> --
>
> Key: GEODE-6327
> URL: https://issues.apache.org/jira/browse/GEODE-6327
> Project: Geode
>  Issue Type: New Feature
>  Components: serialization
>Reporter: Barry Oglesby
>Priority: Major
> Attachments: geode-6327-poc.patch
>
>
> In the current implementation, there is no way to prevent all fields from 
> being used when executing hashCode and equals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GEODE-6327) There needs to be a way to specify identity fields for JSON documents converted to PDX

2019-01-28 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6327:
-
Attachment: geode-6327-poc.patch

> There needs to be a way to specify identity fields for JSON documents 
> converted to PDX
> --
>
> Key: GEODE-6327
> URL: https://issues.apache.org/jira/browse/GEODE-6327
> Project: Geode
>  Issue Type: New Feature
>  Components: serialization
>Reporter: Barry Oglesby
>Priority: Major
> Attachments: geode-6327-poc.patch
>
>
> In the current implementation, there is no way to prevent all fields from 
> being used when executing hashCode and equals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6327) There needs to be a way to specify identity fields for JSON documents converted to PDX

2019-01-28 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6327:


 Summary: There needs to be a way to specify identity fields for 
JSON documents converted to PDX
 Key: GEODE-6327
 URL: https://issues.apache.org/jira/browse/GEODE-6327
 Project: Geode
  Issue Type: New Feature
  Components: serialization
Reporter: Barry Oglesby


In the current implementation, there is no way to prevent all fields from being 
used when executing hashCode and equals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-23 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6267.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
>  at 
> org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
>  at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256)
>  at 
> org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139)
>  at 
> org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797)
>  at 
> org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), 
> but the Subjects are not logged out.
> With subscription-enabled=true, an additional Subject is created and stored 
> in the CacheClientProxy subject. This Subject is not logged out either.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:43.023 PST server1  Thread 0> tid=0x52] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> 

[jira] [Created] (GEODE-6293) Gfsh execute function command expects the function to have a result

2019-01-17 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6293:


 Summary: Gfsh execute function command expects the function to 
have a result
 Key: GEODE-6293
 URL: https://issues.apache.org/jira/browse/GEODE-6293
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barry Oglesby


Functions with hasResult returning false cause gfsh to log this exception 
message:
{noformat}
gfsh>execute function --id=TestNoResultFunction --region=/data
 Member  | Status | Message
 | -- | 

server-1 | ERROR  | Exception: Cannot return any result as the 
Function#hasResult() is false
{noformat}
That message is coming from `UserFunctionExecution.execute` which does:
{noformat}
List results = (List) 
execution.execute(function.getId()).getResult();
{noformat}
Here is the stack where that happens:
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at 
org.apache.geode.internal.cache.execute.NoResult.getResult(NoResult.java:56)
at 
org.apache.geode.management.internal.cli.functions.UserFunctionExecution.execute(UserFunctionExecution.java:156)
at 
org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193)
at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:367)
at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:433)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956)
at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:810)
at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Here is a potential fix that addresses the issue:
{noformat}
List results = null;
ResultCollector rc = execution.execute(function.getId());
if (function.hasResult()) {
  results = (List) rc.getResult();
}
{noformat}
This fix causes gfsh to report an OK result:
{noformat}
gfsh>execute function --id=TestNoResultFunction --region=/data
 Member  | Status | Message
 | -- | ---
server-1 | OK | []
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745357#comment-16745357
 ] 

Barry Oglesby commented on GEODE-6287:
--

I made a change to FilterProfile.cleanupForClient to also clean up the 
clientMap. That addressed the leak.

Now with 15000 client connects/registerInterests/disconnects:
{noformat}
 num #instances #bytes class name
--
 1: 29928 2706856 [C
 2: 7472 840576 java.lang.Class
 3: 29822 715728 java.lang.String
 4: 1607 478240 [B
 5: 5946 445008 [Ljava.lang.Object;
 6: 773 410136 [J
 7: 11820 378240 java.util.concurrent.ConcurrentHashMap$Node
 8: 2980 262240 java.lang.reflect.Method
 9: 7953 254496 java.util.HashMap$Node
 10: 4911 208984 [I
 11: 2138 198408 [Ljava.util.HashMap$Node;
 12: 10827 173232 java.lang.Object
 13: 2119 169520 java.lang.reflect.Constructor
 14: 4120 164800 java.util.LinkedHashMap$Entry
 15: 1726 96656 java.util.LinkedHashMap
Total 175924 9348120
{noformat}

> When a client connects, registers interest and disconnects normally, its 
> ClientProxyMembershipID is not cleaned up and a memory leak occurs
> ---
>
> Key: GEODE-6287
> URL: https://issues.apache.org/jira/browse/GEODE-6287
> Project: Geode
>  Issue Type: Bug
>  Components: client queues, client/server
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client connects to a distributed system and registers interest, the 
> Region's FilterProfile's clientMap (an IDMap) registers the 
> ClientProxyMembershipID in both the realIDs and wireIDs like:
> {noformat}
> realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
> wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
> {noformat}
> When the client leaves normally, the UnregisterInterest command is invoked 
> which removes the interest and the region. Part of that behavior is to remove 
> the regionName from the set of regions.
> {noformat}
> this.regions.remove(regionName)
> {noformat}
> Then ClientInterestList.clearClientInterestList is then invoked which is 
> supposed to clear the FilterProfile for each region, but the regions are 
> already cleared by the UnregisterInterest command, so this method doesn't do 
> anything.
> Then, LocalRegion.cleanupForClient is invoked which invokes 
> FilterProfile.cleanupForClient. This method currently only closes CQs (which 
> also cleans up the cqMap which is also an IDMap like the clientMap).
> At the end of this, the clientMap's realIDs and wireIDs still contain the 
> ClientProxyMembershipID.
> The cleanupForClient method could be changed to also clean up the clientMap.
> Note: If the client is killed abnormally, the UnregisterInterest command is 
> not invoked, so the interest and the region is not cleaned up normally. When 
> ClientInterestList.clearClientInterestList is called, the set of regions 
> still contains the region, and the IDMap is cleaned up properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745352#comment-16745352
 ] 

Barry Oglesby commented on GEODE-6287:
--

Startup:
{noformat}
 num #instances #bytes class name
--
 1: 28887 2606096 [C
 2: 7205 810664 java.lang.Class
 3: 28820 691680 java.lang.String
 4: 2004 494408 [B
 5: 5153 453464 java.lang.reflect.Method
 6: 5538 425016 [Ljava.lang.Object;
 7: 657 389224 [J
 8: 11600 371200 java.util.concurrent.ConcurrentHashMap$Node
 9: 8124 324960 java.util.LinkedHashMap$Entry
 10: 3047 277584 [Ljava.util.HashMap$Node;
 11: 8536 273152 java.util.HashMap$Node
 12: 2533 202640 java.lang.reflect.Constructor
 13: 4764 198616 [I
 14: 10575 169200 java.lang.Object
 15: 2749 153944 java.util.LinkedHashMap
Total 181128 9653224
{noformat}
With 15000 client connects/registerInterests/disconnects:
{noformat}
 num #instances #bytes class name
--
 1: 104958 9068952 [C
 2: 104852 2516448 java.lang.String
 3: 16604 2160248 [B
 4: 41802 1337664 java.util.concurrent.ConcurrentHashMap$Node
 5: 15005 1080360 org.apache.geode.distributed.internal.membership.gms.GMSMember
 6: 7473 840680 java.lang.Class
 7: 15005 720240 
org.apache.geode.distributed.internal.membership.InternalDistributedMember
 8: 15031 480992 java.net.InetAddress$InetAddressHolder
 9: 14996 479872 org.apache.geode.distributed.DurableClientAttributes
 10: 14995 479840 
org.apache.geode.internal.cache.tier.sockets.ClientProxyMembershipID
 11: 5949 445136 [Ljava.lang.Object;
 12: 25820 413120 java.lang.Object
 13: 756 407840 [J
 14: 15225 365400 java.lang.Long
 15: 15026 360624 java.net.Inet4Address
Total 501811 24712960
{noformat}

> When a client connects, registers interest and disconnects normally, its 
> ClientProxyMembershipID is not cleaned up and a memory leak occurs
> ---
>
> Key: GEODE-6287
> URL: https://issues.apache.org/jira/browse/GEODE-6287
> Project: Geode
>  Issue Type: Bug
>  Components: client queues, client/server
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client connects to a distributed system and registers interest, the 
> Region's FilterProfile's clientMap (an IDMap) registers the 
> ClientProxyMembershipID in both the realIDs and wireIDs like:
> {noformat}
> realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
> wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
> {noformat}
> When the client leaves normally, the UnregisterInterest command is invoked 
> which removes the interest and the region. Part of that behavior is to remove 
> the regionName from the set of regions.
> {noformat}
> this.regions.remove(regionName)
> {noformat}
> Then ClientInterestList.clearClientInterestList is then invoked which is 
> supposed to clear the FilterProfile for each region, but the regions are 
> already cleared by the UnregisterInterest command, so this method doesn't do 
> anything.
> Then, LocalRegion.cleanupForClient is invoked which invokes 
> FilterProfile.cleanupForClient. This method currently only closes CQs (which 
> also cleans up the cqMap which is also an IDMap like the clientMap).
> At the end of this, the clientMap's realIDs and wireIDs still contain the 
> ClientProxyMembershipID.
> The cleanupForClient method could be changed to also clean up the clientMap.
> Note: If the client is killed abnormally, the UnregisterInterest command is 
> not invoked, so the interest and the region is not cleaned up normally. When 
> ClientInterestList.clearClientInterestList is called, the set of regions 
> still contains the region, and the IDMap is cleaned up properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6287:
-
Description: 
When a client connects to a distributed system and registers interest, the 
Region's FilterProfile's clientMap (an IDMap) registers the 
ClientProxyMembershipID in both the realIDs and wireIDs like:
{noformat}
realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
{noformat}
When the client leaves normally, the UnregisterInterest command is invoked 
which removes the interest and the region. Part of that behavior is to remove 
the regionName from the set of regions.
{noformat}
this.regions.remove(regionName)
{noformat}
Then ClientInterestList.clearClientInterestList is then invoked which is 
supposed to clear the FilterProfile for each region, but the regions are 
already cleared by the UnregisterInterest command, so this method doesn't do 
anything.

Then, LocalRegion.cleanupForClient is invoked which invokes 
FilterProfile.cleanupForClient. This method currently only closes CQs (which 
also cleans up the cqMap which is also an IDMap like the clientMap).

At the end of this, the clientMap's realIDs and wireIDs still contain the 
ClientProxyMembershipID.

The cleanupForClient method could be changed to also clean up the clientMap.

Note: If the client is killed abnormally, the UnregisterInterest command is not 
invoked, so the interest and the region is not cleaned up normally. When 
ClientInterestList.clearClientInterestList is called, the set of regions still 
contains the region, and the IDMap is cleaned up properly.

  was:
When a client connects to a distributed system and registers interest, the 
Region's FilterProfile's clientMap (an IDMap) registers the 
ClientProxyMembershipID in both the realIDs and wireIDs like:
```
realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
```
When the client leaves normally, the UnregisterInterest command is invoked 
which removes the interest and the region. Part of that behavior is to remove 
the regionName from the set of regions.
```
this.regions.remove(regionName)
```
Then ClientInterestList.clearClientInterestList is then invoked which is 
supposed to clear the FilterProfile for each region, but the regions are 
already cleared by the UnregisterInterest command, so this method doesn't do 
anything.

Then, LocalRegion.cleanupForClient is invoked which invokes 
FilterProfile.cleanupForClient. This method currently only closes CQs (which 
also cleans up the cqMap which is also an IDMap like the clientMap).

At the end of this, the clientMap's realIDs and wireIDs still contain the 
ClientProxyMembershipID.

The cleanupForClient method could be changed to also clean up the clientMap.

Note: If the client is killed abnormally, the UnregisterInterest command is not 
invoked, so the interest and the region is not cleaned up normally. When 
ClientInterestList.clearClientInterestList is called, the set of regions still 
contains the region, and the IDMap is cleaned up properly.


> When a client connects, registers interest and disconnects normally, its 
> ClientProxyMembershipID is not cleaned up and a memory leak occurs
> ---
>
> Key: GEODE-6287
> URL: https://issues.apache.org/jira/browse/GEODE-6287
> Project: Geode
>  Issue Type: Bug
>  Components: client queues, client/server
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client connects to a distributed system and registers interest, the 
> Region's FilterProfile's clientMap (an IDMap) registers the 
> ClientProxyMembershipID in both the realIDs and wireIDs like:
> {noformat}
> realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
> wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
> {noformat}
> When the client leaves normally, the UnregisterInterest command is invoked 
> which removes the interest and the region. Part of that behavior is to remove 
> the regionName from the set of regions.
> {noformat}
> this.regions.remove(regionName)
> {noformat}
> Then ClientInterestList.clearClientInterestList is then invoked which is 
> supposed to clear the FilterProfile for each region, but the regions are 
> already cleared by the UnregisterInterest command, so this method doesn't do 
> anything.
> Then, 

[jira] [Commented] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745349#comment-16745349
 ] 

Barry Oglesby commented on GEODE-6287:
--

When a client connects to a distributed system and registers interest, the 
Region's FilterProfile's clientMap (an IDMap) registers the 
ClientProxyMembershipID in both the realIDs and wireIDs like:
{noformat}
realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
{noformat}
When the client leaves normally, the UnregisterInterest command is invoked 
which removes the interest and the region. Part of that behavior is to remove 
the regionName from the set of regions.
{noformat}
this.regions.remove(regionName)
{noformat}
Then ClientInterestList.clearClientInterestList is then invoked which is 
supposed to clear the FilterProfile for each region, but the regions are 
already cleared by the UnregisterInterest command, so this method doesn't do 
anything.

Then, LocalRegion.cleanupForClient is invoked which invokes 
FilterProfile.cleanupForClient. This method currently only closes CQs (which 
also cleans up the cqMap which is also an IDMap like the clientMap).

At the end of this, the clientMap's realIDs and wireIDs still contain the 
ClientProxyMembershipID.

The cleanupForClient method could be changed to also clean up the clientMap.

Note: If the client is killed abnormally, the UnregisterInterest command is not 
invoked, so the interest and the region is not cleaned up normally. When 
ClientInterestList.clearClientInterestList is called, the set of regions still 
contains the region, and the IDMap is cleaned up properly.



Here is a stack trace showing the ClientProxyMembershipID being registered:
{noformat}
java.lang.Exception: Stack trace
 at java.lang.Thread.dumpStack(Thread.java:1333)
 at 
org.apache.geode.internal.cache.FilterProfile$IDMap.getWireID(FilterProfile.java:2032)
 at 
org.apache.geode.internal.cache.FilterProfile.getClientIDForMaps(FilterProfile.java:1615)
 at 
org.apache.geode.internal.cache.FilterProfile.registerClientInterest(FilterProfile.java:261)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy$ClientInterestList.registerClientInterest(CacheClientProxy.java:2052)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.registerClientInterest(CacheClientProxy.java:1270)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerClientInterest(CacheClientNotifier.java:1194)
 at 
org.apache.geode.internal.cache.tier.sockets.command.RegisterInterest61.cmdExecute(RegisterInterest61.java:200)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844)
 at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1218)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:613)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
 at java.lang.Thread.run(Thread.java:745)
{noformat}
Here is a stack trace showing the UnregisterInterest command unregistering the 
client interest:
{noformat}
java.lang.Exception: Stack trace
 at java.lang.Thread.dumpStack(Thread.java:1333)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy$ClientInterestList.unregisterClientInterest(CacheClientProxy.java:2085)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.unregisterClientInterest(CacheClientProxy.java:1330)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.unregisterClientInterest(CacheClientNotifier.java:1245)
 at 
org.apache.geode.internal.cache.tier.sockets.command.UnregisterInterest.cmdExecute(UnregisterInterest.java:144)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844)
 at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:75)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1215)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 

[jira] [Updated] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6287:
-
Description: 
When a client connects to a distributed system and registers interest, the 
Region's FilterProfile's clientMap (an IDMap) registers the 
ClientProxyMembershipID in both the realIDs and wireIDs like:
{noformat}
realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
{noformat}
When the client leaves normally, the UnregisterInterest command is invoked 
which removes the interest and the region. Part of that behavior is to remove 
the regionName from the set of regions.
{noformat}
this.regions.remove(regionName)
{noformat}
Then ClientInterestList.clearClientInterestList is then invoked which is 
supposed to clear the FilterProfile for each region, but the regions are 
already cleared by the UnregisterInterest command, so this method doesn't do 
anything.

Then, LocalRegion.cleanupForClient is invoked which invokes 
FilterProfile.cleanupForClient. This method currently only closes CQs (which 
also cleans up the cqMap which is also an IDMap like the clientMap).

At the end of this, the clientMap's realIDs and wireIDs still contain the 
ClientProxyMembershipID.

The cleanupForClient method could be changed to also clean up the clientMap.

Note: If the client is killed abnormally, the UnregisterInterest command is not 
invoked, so the interest and the region is not cleaned up normally. When 
ClientInterestList.clearClientInterestList is called, the set of regions still 
contains the region, and the IDMap is cleaned up properly.

  was:
When a client connects to a distributed system and registers interest, the 
Region's FilterProfile's clientMap (an IDMap) registers the 
ClientProxyMembershipID in both the realIDs and wireIDs like:
{noformat}
realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
{noformat}
When the client leaves normally, the UnregisterInterest command is invoked 
which removes the interest and the region. Part of that behavior is to remove 
the regionName from the set of regions.
{noformat}
this.regions.remove(regionName)
{noformat}
Then ClientInterestList.clearClientInterestList is then invoked which is 
supposed to clear the FilterProfile for each region, but the regions are 
already cleared by the UnregisterInterest command, so this method doesn't do 
anything.

Then, LocalRegion.cleanupForClient is invoked which invokes 
FilterProfile.cleanupForClient. This method currently only closes CQs (which 
also cleans up the cqMap which is also an IDMap like the clientMap).

At the end of this, the clientMap's realIDs and wireIDs still contain the 
ClientProxyMembershipID.

The cleanupForClient method could be changed to also clean up the clientMap.

Note: If the client is killed abnormally, the UnregisterInterest command is not 
invoked, so the interest and the region is not cleaned up normally. When 
ClientInterestList.clearClientInterestList is called, the set of regions still 
contains the region, and the IDMap is cleaned up properly.


> When a client connects, registers interest and disconnects normally, its 
> ClientProxyMembershipID is not cleaned up and a memory leak occurs
> ---
>
> Key: GEODE-6287
> URL: https://issues.apache.org/jira/browse/GEODE-6287
> Project: Geode
>  Issue Type: Bug
>  Components: client queues, client/server
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client connects to a distributed system and registers interest, the 
> Region's FilterProfile's clientMap (an IDMap) registers the 
> ClientProxyMembershipID in both the realIDs and wireIDs like:
> {noformat}
> realIDs={identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
> wireIDs={1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
> {noformat}
> When the client leaves normally, the UnregisterInterest command is invoked 
> which removes the interest and the region. Part of that behavior is to remove 
> the regionName from the set of regions.
> {noformat}
> this.regions.remove(regionName)
> {noformat}
> Then ClientInterestList.clearClientInterestList is then invoked which is 
> supposed to clear the FilterProfile for each region, but the regions are 
> already cleared by the UnregisterInterest command, so this method doesn't do 
> anything.

[jira] [Created] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6287:


 Summary: When a client connects, registers interest and 
disconnects normally, its ClientProxyMembershipID is not cleaned up and a 
memory leak occurs
 Key: GEODE-6287
 URL: https://issues.apache.org/jira/browse/GEODE-6287
 Project: Geode
  Issue Type: Bug
  Components: client queues, client/server
Reporter: Barry Oglesby


When a client connects to a distributed system and registers interest, the 
Region's FilterProfile's clientMap (an IDMap) registers the 
ClientProxyMembershipID in both the realIDs and wireIDs like:
```
realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
```
When the client leaves normally, the UnregisterInterest command is invoked 
which removes the interest and the region. Part of that behavior is to remove 
the regionName from the set of regions.
```
this.regions.remove(regionName)
```
Then ClientInterestList.clearClientInterestList is then invoked which is 
supposed to clear the FilterProfile for each region, but the regions are 
already cleared by the UnregisterInterest command, so this method doesn't do 
anything.

Then, LocalRegion.cleanupForClient is invoked which invokes 
FilterProfile.cleanupForClient. This method currently only closes CQs (which 
also cleans up the cqMap which is also an IDMap like the clientMap).

At the end of this, the clientMap's realIDs and wireIDs still contain the 
ClientProxyMembershipID.

The cleanupForClient method could be changed to also clean up the clientMap.

Note: If the client is killed abnormally, the UnregisterInterest command is not 
invoked, so the interest and the region is not cleaned up normally. When 
ClientInterestList.clearClientInterestList is called, the set of regions still 
contains the region, and the IDMap is cleaned up properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GEODE-6287) When a client connects, registers interest and disconnects normally, its ClientProxyMembershipID is not cleaned up and a memory leak occurs

2019-01-17 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6287:


Assignee: Barry Oglesby

> When a client connects, registers interest and disconnects normally, its 
> ClientProxyMembershipID is not cleaned up and a memory leak occurs
> ---
>
> Key: GEODE-6287
> URL: https://issues.apache.org/jira/browse/GEODE-6287
> Project: Geode
>  Issue Type: Bug
>  Components: client queues, client/server
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client connects to a distributed system and registers interest, the 
> Region's FilterProfile's clientMap (an IDMap) registers the 
> ClientProxyMembershipID in both the realIDs and wireIDs like:
> ```
> realIDs=\{identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2=1};
> wireIDs=\{1=identity(192.168.2.12(client-register:52879:loner):63013:2327c553:client-register,connection=2}
> ```
> When the client leaves normally, the UnregisterInterest command is invoked 
> which removes the interest and the region. Part of that behavior is to remove 
> the regionName from the set of regions.
> ```
> this.regions.remove(regionName)
> ```
> Then ClientInterestList.clearClientInterestList is then invoked which is 
> supposed to clear the FilterProfile for each region, but the regions are 
> already cleared by the UnregisterInterest command, so this method doesn't do 
> anything.
> Then, LocalRegion.cleanupForClient is invoked which invokes 
> FilterProfile.cleanupForClient. This method currently only closes CQs (which 
> also cleans up the cqMap which is also an IDMap like the clientMap).
> At the end of this, the clientMap's realIDs and wireIDs still contain the 
> ClientProxyMembershipID.
> The cleanupForClient method could be changed to also clean up the clientMap.
> Note: If the client is killed abnormally, the UnregisterInterest command is 
> not invoked, so the interest and the region is not cleaned up normally. When 
> ClientInterestList.clearClientInterestList is called, the set of regions 
> still contains the region, and the IDMap is cleaned up properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-670) The size of the GatewaySenderEvent is sometimes calculated by serializing its value rather than using the Sizeable interface

2019-01-16 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-670.
-
Resolution: Fixed

> The size of the GatewaySenderEvent is sometimes calculated by serializing its 
> value rather than using the Sizeable interface
> 
>
> Key: GEODE-670
> URL: https://issues.apache.org/jira/browse/GEODE-670
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> {{BucketRegion calcMemSize}} special-cases {{GatewaySenderEventImpl}} to get 
> just its value. In most cases, the value is a byte[], so the size is just the 
> length of the byte[]. If the {{GatewayEventSubstitutionFilter}} is used, then 
> the event's value is null and its valueObject is a java object. In this case, 
> the valueObject is serialized and returned. {{BucketRegion calcMemSize}} then 
> just returns the length of that byte{} using {{CachedDeserializableFactory 
> calcMemSize}}.
> {{GatewaySenderEventImpl}} shouldn't be special-cased. It can be sized using 
> {{CachedDeserializableFactory calcMemSize}} just like other values. This will 
> invoke {{GatewaySenderEventImpl getSizeInBytes}} which does the right thing 
> for the valueObject by invoking {{CachedDeserializableFactory calcMemSize}} 
> on it. This method uses the {{Sizeable}} interface if appropriate. The 
> resulting size will be a bit bigger but more accurate than what is currently 
> reported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-6246) An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy during GatewaySender queue initialization

2019-01-16 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6246.
--
Resolution: Fixed

> An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy 
> during GatewaySender queue initialization
> ---
>
> Key: GEODE-6246
> URL: https://issues.apache.org/jira/browse/GEODE-6246
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> An EntryNotFoundException like below can be thrown by 
> BucketRegionQueue.basicDestroy during GatewaySender queue initialization:
> {noformat}
> [warn 2019/01/03 15:53:00.423 PST  
> tid=0x56] Task failed with exception
> org.apache.geode.cache.EntryNotFoundException: 57546
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.basicDestroy(BucketRegionQueue.java:368)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.destroyKey(BucketRegionQueue.java:564)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.destroyFailedBatchRemovalMessageKeys(BucketRegionQueue.java:181)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.initializeEventSeqNumQueue(BucketRegionQueue.java:151)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.cleanUpDestroyedTokensAndMarkGIIComplete(BucketRegionQueue.java:89)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1220)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1071)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:256)
>   at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:1012)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:776)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:451)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:310)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2881)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1122)
>   at 
> org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:511)
>   at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
>   at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
>   at 
> org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:846)
>   at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-15 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743391#comment-16743391
 ] 

Barry Oglesby commented on GEODE-6267:
--

I found something else while debugging this leak.

If the ClentHealthMonitor unregisters the CacheClientProxy, most of 
closeTransientFields is short-circuited. This is the normal code path.

In this code path, terminateDispatching calls closeSocket before calling 
closeTransientFields. In the finally block, closeTransientFields is called.
{noformat}
java.lang.Exception: Stack trace
 at java.lang.Thread.dumpStack(Thread.java:1333)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.closeTransientFields(CacheClientProxy.java:965)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:945)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.close(CacheClientProxy.java:794)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.closeDeadProxies(CacheClientNotifier.java:1712)
 at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.unregisterClient(CacheClientNotifier.java:724)
 at 
org.apache.geode.internal.cache.tier.sockets.ClientHealthMonitor.unregisterClient(ClientHealthMonitor.java:270)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:958)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:878)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1229)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:613)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
{noformat}
Since closeSocket has already been called interminateDispatching, 
closeTransientFields short-circuits the rest of the method:
{noformat}
if (!closeSocket()) {
 // The thread who closed the socket will be responsible to
 // releaseResourcesForAddress and clearClientInterestList
 return;
}
{noformat}
This means that these methods aren't called:
{noformat}
releaseCommBuffer
releaseResourcesForAddress
{noformat}
I addressed this issue in my changes as well.

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
>  at 
> org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
>  at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
>  at 
> 

[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-15 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743383#comment-16743383
 ] 

Barry Oglesby commented on GEODE-6267:
--

Here is a concise list of instances:

Startup:
Total 186115 9987688

After 1 clients:
Total 1403107 61797808

After subject logout:
Total 391292 19641912

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
>  at 
> org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
>  at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256)
>  at 
> org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139)
>  at 
> org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797)
>  at 
> org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), 
> but the Subjects are not logged out.
> With subscription-enabled=true, an additional Subject is created and stored 
> in the CacheClientProxy subject. This Subject is not logged out either.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:43.023 PST server1  Thread 0> tid=0x52] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> 

[jira] [Comment Edited] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-15 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743383#comment-16743383
 ] 

Barry Oglesby edited comment on GEODE-6267 at 1/15/19 8:41 PM:
---

Here is a concise list of instances and memory used:

Startup:
 Total 186115 9987688

After 1 clients:
 Total 1403107 61797808

After subject logout:
 Total 391292 19641912


was (Author: barry.oglesby):
Here is a concise list of instances:

Startup:
Total 186115 9987688

After 1 clients:
Total 1403107 61797808

After subject logout:
Total 391292 19641912

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
>  at 
> org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
>  at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256)
>  at 
> org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139)
>  at 
> org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797)
>  at 
> org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), 
> but the Subjects are not logged out.
> With subscription-enabled=true, an additional Subject is created and stored 
> in the CacheClientProxy subject. This Subject is not logged out either.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:43.023 PST server1  Thread 0> tid=0x52] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> 

[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-15 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743380#comment-16743380
 ] 

Barry Oglesby commented on GEODE-6267:
--

After adding code to logout the Subjects in ClientUserAuths and 
CacheClientProxy:
{noformat}
 num #instances #bytes class name
--
 1: 79717 7093992 [C
 2: 79610 1910640 java.lang.String
 3: 11596 1599472 [B
 4: 31794 1017408 java.util.concurrent.ConcurrentHashMap$Node
 5: 7477 841096 java.lang.Class
 6: 9998 719856 org.apache.geode.distributed.internal.membership.gms.GMSMember
 7: 9998 479904 
org.apache.geode.distributed.internal.membership.InternalDistributedMember
 8: 5956 445384 [Ljava.lang.Object;
 9: 753 408176 [J
 10: 20810 332960 java.lang.Object
 11: 10022 320704 java.net.InetAddress$InetAddressHolder
 12: 9989 319648 org.apache.geode.distributed.DurableClientAttributes
 13: 9988 319616 
org.apache.geode.internal.cache.tier.sockets.ClientProxyMembershipID
 14: 2980 262240 java.lang.reflect.Method
 15: 7956 254592 java.util.HashMap$Node
 16: 10221 245304 java.lang.Long
 17: 10017 240408 java.net.Inet4Address
 18: 122 223824 [Ljava.util.concurrent.ConcurrentHashMap$Node;
 19: 4881 208488 [I
 20: 11264 202016 [Ljava.lang.String;
 21: 2139 198488 [Ljava.util.HashMap$Node;
 22: 2119 169520 java.lang.reflect.Constructor
 23: 4120 164800 java.util.LinkedHashMap$Entry
 24: 1726 96656 java.util.LinkedHashMap
 25: 1760 84480 java.util.HashMap
Total 391292 19641912
{noformat}
There is still a leak here in I think ClientProxyMembershipIDs. I'll take a 
look at that.

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
>  at 
> org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
>  at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256)
>  at 
> org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139)
>  at 
> org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797)
>  at 
> org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85)
>  at 
> 

[jira] [Commented] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-15 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743377#comment-16743377
 ] 

Barry Oglesby commented on GEODE-6267:
--

Here are some histograms.

Startup:
{noformat}
 num #instances #bytes class name
--
 1: 66767 8069280 [C
 2: 4565 5288072 [B
 3: 36908 2952640 java.util.zip.ZipEntry
 4: 66702 1600848 java.lang.String
 5: 7002 789552 java.lang.Class
 6: 5084 567984 [Ljava.lang.Object;
 7: 5153 453464 java.lang.reflect.Method
 8: 656 388464 [J
 9: 11897 380704 java.util.concurrent.ConcurrentHashMap$Node
 10: 3954 336880 [Ljava.util.HashMap$Node;
 11: 8124 324960 java.util.LinkedHashMap$Entry
 12: 10121 323872 java.util.HashMap$Node
 13: 4751 198016 [I
 14: 10665 170640 java.lang.Object
 15: 2131 170480 java.lang.reflect.Constructor
 16: 2749 153944 java.util.LinkedHashMap
 17: 1619 116568 java.lang.reflect.Field
 18: 5503 115384 [Ljava.lang.Class;
 19: 2202 105696 java.util.HashMap
 20: 118 94864 [Ljava.util.concurrent.ConcurrentHashMap$Node;
 21: 2096 80416 [Ljava.lang.String;
 22: 1871 74840 java.lang.ref.Finalizer
 23: 1175 65800 java.lang.Class$ReflectionData
 24: 1409 56360 java.lang.ref.SoftReference
 25: 607 53160 [Ljava.lang.reflect.Method;
Total 186115 9987688
{noformat}
After connecting and disconnecting a client 1 times:
{noformat}
 num #instances #bytes class name
--
 1: 229739 15010256 [C
 2: 92139 7398488 [Ljava.util.HashMap$Node;
 3: 229634 5511216 java.lang.String
 4: 61726 3456656 java.util.LinkedHashMap
 5: 97956 3134592 java.util.HashMap$Node
 6: 64120 2564800 java.util.LinkedHashMap$Entry
 7: 35955 2125344 [Ljava.lang.Object;
 8: 61804 1977728 java.util.concurrent.ConcurrentHashMap$Node
 9: 30485 1954128 [Ljava.util.Hashtable$Entry;
 10: 60493 1935776 java.util.Hashtable$Entry
 11: 11601 1597640 [B
 12: 31760 1524480 java.util.HashMap
 13: 30072 1443456 java.util.Properties
 14: 3 144 org.apache.shiro.session.mgt.SimpleSession
 15: 7477 841096 java.lang.Class
 16: 30464 731136 java.util.ArrayList
 17: 30009 720216 java.util.Date
 18: 10003 720216 org.apache.geode.distributed.internal.membership.gms.GMSMember
 19: 3 72 TestPrincipal
 20: 3 72 org.apache.shiro.subject.SimplePrincipalCollection
 21: 122 485904 [Ljava.util.concurrent.ConcurrentHashMap$Node;
 22: 10003 480144 
org.apache.geode.distributed.internal.membership.InternalDistributedMember
 23: 30007 480112 java.util.LinkedHashSet
 24: 732 405536 [J
 25: 20815 333040 java.lang.Object
Total 1403107 61797808
{noformat}
After connecting and disconnecting a client enough times to create ~182k 
SimpleSessions:
{noformat}
 num #instances #bytes class name
--
 1: 1252634 78052152 [C
 2: 552473 44224760 [Ljava.util.HashMap$Node;
 3: 1252511 30060264 java.lang.String
 4: 368616 20642496 java.util.LinkedHashMap
 5: 558197 17862304 java.util.HashMap$Node
 6: 371010 14840400 java.util.LinkedHashMap$Entry
 7: 183930 11774608 [Ljava.util.Hashtable$Entry;
 8: 367383 11756256 java.util.Hashtable$Entry
 9: 189397 10719072 [Ljava.lang.Object;
 10: 317462 10158784 java.util.concurrent.ConcurrentHashMap$Node
 11: 185203 8889744 java.util.HashMap
 12: 183517 8808816 java.util.Properties
 13: 183445 8805360 org.apache.shiro.session.mgt.SimpleSession
 14: 62707 7321352 [B
 15: 183909 4413816 java.util.ArrayList
 16: 183454 4402896 java.util.Date
 17: 183445 4402680 TestPrincipal
 18: 183445 4402680 org.apache.shiro.subject.SimplePrincipalCollection
 19: 61108 4399776 
org.apache.geode.distributed.internal.membership.gms.GMSMember
 20: 183452 2935232 java.util.LinkedHashSet
Total 7692368 330322672
{noformat}

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> 

[jira] [Assigned] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-11 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6267:


Assignee: Barry Oglesby

> Subjects are not logged out when a client departs causing a memory leak
> ---
>
> Key: GEODE-6267
> URL: https://issues.apache.org/jira/browse/GEODE-6267
> Project: Geode
>  Issue Type: Bug
>  Components: security
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client with security enabled connects to a server, the 
> IntegratedSecurityService logs in a Subject. This causes a SimpleSession to 
> be created.
> The Subject is stored in ClientUserAuths.uniqueIdVsSubject.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:42.993 PST server1  Thread 0> tid=0x4e] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
>  at 
> org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
>  at 
> org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
>  at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
>  at 
> org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
>  at 
> org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256)
>  at 
> org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139)
>  at 
> org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797)
>  at 
> org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), 
> but the Subjects are not logged out.
> With subscription-enabled=true, an additional Subject is created and stored 
> in the CacheClientProxy subject. This Subject is not logged out either.
> Here is a stack showing the SimpleSession creation:
> {noformat}
> [warning 2019/01/08 18:02:43.023 PST server1  Thread 0> tid=0x52] SimpleSession. invoked:
> java.lang.Exception
>  at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
>  at 
> org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
>  at 
> org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
>  at 
> org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
>  at 
> 

[jira] [Created] (GEODE-6267) Subjects are not logged out when a client departs causing a memory leak

2019-01-11 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6267:


 Summary: Subjects are not logged out when a client departs causing 
a memory leak
 Key: GEODE-6267
 URL: https://issues.apache.org/jira/browse/GEODE-6267
 Project: Geode
  Issue Type: Bug
  Components: security
Reporter: Barry Oglesby


When a client with security enabled connects to a server, the 
IntegratedSecurityService logs in a Subject. This causes a SimpleSession to be 
created.

The Subject is stored in ClientUserAuths.uniqueIdVsSubject.

Here is a stack showing the SimpleSession creation:
{noformat}
[warning 2019/01/08 18:02:42.993 PST server1  tid=0x4e] SimpleSession. invoked:
java.lang.Exception
 at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
 at 
org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
 at 
org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
 at 
org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
 at 
org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
 at 
org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
 at 
org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
 at 
org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
 at 
org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
 at 
org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
 at 
org.apache.shiro.mgt.DefaultSubjectDAO.saveToSession(DefaultSubjectDAO.java:166)
 at org.apache.shiro.mgt.DefaultSubjectDAO.save(DefaultSubjectDAO.java:147)
 at 
org.apache.shiro.mgt.DefaultSecurityManager.save(DefaultSecurityManager.java:383)
 at 
org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:350)
 at 
org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:183)
 at 
org.apache.shiro.mgt.DefaultSecurityManager.login(DefaultSecurityManager.java:283)
 at 
org.apache.shiro.subject.support.DelegatingSubject.login(DelegatingSubject.java:256)
 at 
org.apache.geode.internal.security.IntegratedSecurityService.login(IntegratedSecurityService.java:139)
 at 
org.apache.geode.internal.cache.tier.sockets.HandShake.verifyCredentials(HandShake.java:1688)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.setCredentials(ServerConnection.java:1044)
 at 
org.apache.geode.internal.cache.tier.sockets.command.PutUserCredentials.cmdExecute(PutUserCredentials.java:52)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:163)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:797)
 at 
org.apache.geode.internal.cache.tier.sockets.LegacyServerConnection.doOneMessage(LegacyServerConnection.java:85)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:641)
 at java.lang.Thread.run(Thread.java:745)
{noformat}
When the client disconnects, the ClientUserAuths is cleaned up (in cleanup), 
but the Subjects are not logged out.

With subscription-enabled=true, an additional Subject is created and stored in 
the CacheClientProxy subject. This Subject is not logged out either.

Here is a stack showing the SimpleSession creation:
{noformat}
[warning 2019/01/08 18:02:43.023 PST server1  tid=0x52] SimpleSession. invoked:
java.lang.Exception
 at org.apache.shiro.session.mgt.SimpleSession.(SimpleSession.java:99)
 at 
org.apache.shiro.session.mgt.SimpleSessionFactory.createSession(SimpleSessionFactory.java:44)
 at 
org.apache.shiro.session.mgt.DefaultSessionManager.newSessionInstance(DefaultSessionManager.java:163)
 at 
org.apache.shiro.session.mgt.DefaultSessionManager.doCreateSession(DefaultSessionManager.java:154)
 at 
org.apache.shiro.session.mgt.AbstractValidatingSessionManager.createSession(AbstractValidatingSessionManager.java:136)
 at 
org.apache.shiro.session.mgt.AbstractNativeSessionManager.start(AbstractNativeSessionManager.java:99)
 at 
org.apache.shiro.mgt.SessionsSecurityManager.start(SessionsSecurityManager.java:152)
 at 
org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:336)
 at 
org.apache.shiro.subject.support.DelegatingSubject.getSession(DelegatingSubject.java:312)
 at 
org.apache.shiro.mgt.DefaultSubjectDAO.mergePrincipals(DefaultSubjectDAO.java:204)
 at 

[jira] [Assigned] (GEODE-6246) An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy during GatewaySender queue initialization

2019-01-03 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6246:


Assignee: Barry Oglesby

> An EntryNotFoundException can be thrown by BucketRegionQueue.basicDestroy 
> during GatewaySender queue initialization
> ---
>
> Key: GEODE-6246
> URL: https://issues.apache.org/jira/browse/GEODE-6246
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> An EntryNotFoundException like below can be thrown by 
> BucketRegionQueue.basicDestroy during GatewaySender queue initialization:
> {noformat}
> [warn 2019/01/03 15:53:00.423 PST  
> tid=0x56] Task failed with exception
> org.apache.geode.cache.EntryNotFoundException: 57546
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.basicDestroy(BucketRegionQueue.java:368)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.destroyKey(BucketRegionQueue.java:564)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.destroyFailedBatchRemovalMessageKeys(BucketRegionQueue.java:181)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.initializeEventSeqNumQueue(BucketRegionQueue.java:151)
>   at 
> org.apache.geode.internal.cache.BucketRegionQueue.cleanUpDestroyedTokensAndMarkGIIComplete(BucketRegionQueue.java:89)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1220)
>   at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1071)
>   at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:256)
>   at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:1012)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:776)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:451)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:310)
>   at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2881)
>   at 
> org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1122)
>   at 
> org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:511)
>   at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
>   at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
>   at 
> org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:846)
>   at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-21 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-6205.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> The cluster configuration service create disk-store command saves the 
> relative path name rather than the absolute one in the disk-dir
> -
>
> Key: GEODE-6205
> URL: https://issues.apache.org/jira/browse/GEODE-6205
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This command:
> {noformat}
> create disk-store --name=data_store --dir=/path/to/geode_data`
> {noformat}
> Creates this disk-store configuration:
> {noformat}
> 
>  
>geode_data
>  
> 
> {noformat}
> Only the relative file name is saved.
> The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
> calling getName:
> {noformat}
>  for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
>DiskDirType diskDir = new DiskDirType();
> -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
>
> diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
>diskDirs.add(diskDir);
>  }
> {noformat}
> Instead if it called getAbsolutePath, the disk-store configuration would be:
> {noformat}
> 
>  
>/path/to/geode_data
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-17 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-6205:


Assignee: Barry Oglesby

> The cluster configuration service create disk-store command saves the 
> relative path name rather than the absolute one in the disk-dir
> -
>
> Key: GEODE-6205
> URL: https://issues.apache.org/jira/browse/GEODE-6205
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> This command:
> {noformat}
> create disk-store --name=data_store --dir=/path/to/geode_data`
> {noformat}
> Creates this disk-store configuration:
> {noformat}
> 
>  
>geode_data
>  
> 
> {noformat}
> Only the relative file name is saved.
> The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
> calling getName:
> {noformat}
>  for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
>DiskDirType diskDir = new DiskDirType();
> -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
>
> diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
>diskDirs.add(diskDir);
>  }
> {noformat}
> Instead if it called getAbsolutePath, the disk-store configuration would be:
> {noformat}
> 
>  
>/path/to/geode_data
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-14 Thread Barry Oglesby (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721954#comment-16721954
 ] 

Barry Oglesby commented on GEODE-6205:
--

Using the absolute path doesn't allow multiple servers to start on the same 
host. Setting the disk-dir should depend on the input file name. If it is 
relative, the disk-dir should be relative; if it is absolute, the disk-dir 
should be absolute.

> The cluster configuration service create disk-store command saves the 
> relative path name rather than the absolute one in the disk-dir
> -
>
> Key: GEODE-6205
> URL: https://issues.apache.org/jira/browse/GEODE-6205
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Priority: Major
>
> This command:
> {noformat}
> create disk-store --name=data_store --dir=/path/to/geode_data`
> {noformat}
> Creates this disk-store configuration:
> {noformat}
> 
>  
>geode_data
>  
> 
> {noformat}
> Only the relative file name is saved.
> The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
> calling getName:
> {noformat}
>  for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
>DiskDirType diskDir = new DiskDirType();
> -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
>
> diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
>diskDirs.add(diskDir);
>  }
> {noformat}
> Instead if it called getAbsolutePath, the disk-store configuration would be:
> {noformat}
> 
>  
>/path/to/geode_data
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-14 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6205:
-
Description: 
This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
   geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
   DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
   
diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
   diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
   /path/to/geode_data
 

{noformat}

  was:
This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
   geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
 DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
 diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
 diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
   /path/to/geode_data
 

{noformat}


> The cluster configuration service create disk-store command saves the 
> relative path name rather than the absolute one in the disk-dir
> -
>
> Key: GEODE-6205
> URL: https://issues.apache.org/jira/browse/GEODE-6205
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Priority: Major
>
> This command:
> {noformat}
> create disk-store --name=data_store --dir=/path/to/geode_data`
> {noformat}
> Creates this disk-store configuration:
> {noformat}
> 
>  
>geode_data
>  
> 
> {noformat}
> Only the relative file name is saved.
> The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
> calling getName:
> {noformat}
>  for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
>DiskDirType diskDir = new DiskDirType();
> -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
>
> diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
>diskDirs.add(diskDir);
>  }
> {noformat}
> Instead if it called getAbsolutePath, the disk-store configuration would be:
> {noformat}
> 
>  
>/path/to/geode_data
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-14 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6205:
-
Description: 
This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
 geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
 DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
 diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
 diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
 /path/to/geode_data
 

{noformat}

  was:
This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
 geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
 DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
 diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
 diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
 /path/to/gemfire_data
 

{noformat}


> The cluster configuration service create disk-store command saves the 
> relative path name rather than the absolute one in the disk-dir
> -
>
> Key: GEODE-6205
> URL: https://issues.apache.org/jira/browse/GEODE-6205
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Priority: Major
>
> This command:
> {noformat}
> create disk-store --name=data_store --dir=/path/to/geode_data`
> {noformat}
> Creates this disk-store configuration:
> {noformat}
> 
>  
>  geode_data
>  
> 
> {noformat}
> Only the relative file name is saved.
> The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
> calling getName:
> {noformat}
>  for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
>  DiskDirType diskDir = new DiskDirType();
> -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
>  
> diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
>  diskDirs.add(diskDir);
>  }
> {noformat}
> Instead if it called getAbsolutePath, the disk-store configuration would be:
> {noformat}
> 
>  
>  /path/to/geode_data
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-14 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6205:


 Summary: The cluster configuration service create disk-store 
command saves the relative path name rather than the absolute one in the 
disk-dir
 Key: GEODE-6205
 URL: https://issues.apache.org/jira/browse/GEODE-6205
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Barry Oglesby


This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
 geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
 DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
 diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
 diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
 /path/to/gemfire_data
 

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GEODE-6205) The cluster configuration service create disk-store command saves the relative path name rather than the absolute one in the disk-dir

2018-12-14 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby updated GEODE-6205:
-
Description: 
This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
   geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
 DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
 diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
 diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
   /path/to/geode_data
 

{noformat}

  was:
This command:
{noformat}
create disk-store --name=data_store --dir=/path/to/geode_data`
{noformat}
Creates this disk-store configuration:
{noformat}

 
 geode_data
 

{noformat}
Only the relative file name is saved.

The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
calling getName:
{noformat}
 for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
 DiskDirType diskDir = new DiskDirType();
-> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
 diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
 diskDirs.add(diskDir);
 }
{noformat}
Instead if it called getAbsolutePath, the disk-store configuration would be:
{noformat}

 
 /path/to/geode_data
 

{noformat}


> The cluster configuration service create disk-store command saves the 
> relative path name rather than the absolute one in the disk-dir
> -
>
> Key: GEODE-6205
> URL: https://issues.apache.org/jira/browse/GEODE-6205
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Barry Oglesby
>Priority: Major
>
> This command:
> {noformat}
> create disk-store --name=data_store --dir=/path/to/geode_data`
> {noformat}
> Creates this disk-store configuration:
> {noformat}
> 
>  
>geode_data
>  
> 
> {noformat}
> Only the relative file name is saved.
> The CreateDiskStoreCommand createDiskStoreType sets the relative path by 
> calling getName:
> {noformat}
>  for (int i = 0; i < diskStoreAttributes.getDiskDirs().length; i++) {
>  DiskDirType diskDir = new DiskDirType();
> -> diskDir.setContent(diskStoreAttributes.getDiskDirs()[i].getName());
>  
> diskDir.setDirSize(Integer.toString(diskStoreAttributes.getDiskDirSizes()[i]));
>  diskDirs.add(diskDir);
>  }
> {noformat}
> Instead if it called getAbsolutePath, the disk-store configuration would be:
> {noformat}
> 
>  
>/path/to/geode_data
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GEODE-6186) Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch processing with conflation enabled

2018-12-11 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-6186:


 Summary: Reduce the number of EntryNotFoundExceptions during 
AsyncEventQueue batch processing with conflation enabled
 Key: GEODE-6186
 URL: https://issues.apache.org/jira/browse/GEODE-6186
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Barry Oglesby


Reduce the number of EntryNotFoundExceptions during AsyncEventQueue batch 
processing with conflation enabled

This test:

3000 iterations of putAlls with the same 1500 keys into a partitioned region 
attached to async-event-queue:



Produces these numbers in the current code (4 different runs):
{noformat}
numBatches=645; numENFEs=8622196; totalPeekTime=178517; averagePeekTime=276; 
totalProcessBatchTime=38936; averageProcessBatchTime=60
numBatches=660; numENFEs=8467986; totalPeekTime=182985; averagePeekTime=277; 
totalProcessBatchTime=34335; averageProcessBatchTime=52
numBatches=646; numENFEs=8563364; totalPeekTime=179624; averagePeekTime=278; 
totalProcessBatchTime=37342; averageProcessBatchTime=57
numBatches=632; numENFEs=8716942; totalPeekTime=175570; averagePeekTime=277; 
totalProcessBatchTime=39732; averageProcessBatchTime=62
{noformat}
After some changes mainly in BucketRegionQueue:
{noformat}
numBatches=782; numENFEs=3621039; totalPeekTime=195760; averagePeekTime=250; 
totalProcessBatchTime=18724; averageProcessBatchTime=23
numBatches=791; numENFEs=3604933; totalPeekTime=197980; averagePeekTime=250; 
totalProcessBatchTime=18587; averageProcessBatchTime=23
numBatches=790; numENFEs=3600038; totalPeekTime=197774; averagePeekTime=250; 
totalProcessBatchTime=18611; averageProcessBatchTime=23
numBatches=795; numENFEs=3584490; totalPeekTime=199060; averagePeekTime=250; 
totalProcessBatchTime=18063; averageProcessBatchTime=22
{noformat}
numBatches is the number of batches peeked
numENFEs is the number of EntryNotFoundExceptions thrown
totalPeekTime is the total time to peek all batches
averagePeekTime is the average time to peek a batch
totalProcessBatchTime is the total time to process all batches
averageProcessBatchTime is the average time to process a batch (includes 
listener callback and remove from queue)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GEODE-5959) Nested function executions can cause a performance issue

2018-11-12 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-5959.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> Nested function executions can cause a performance issue
> 
>
> Key: GEODE-5959
> URL: https://issues.apache.org/jira/browse/GEODE-5959
> Project: Geode
>  Issue Type: Bug
>  Components: functions
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When a client executes a function, the server does:
> 1. The ServerConnection receives the function request, creates a runnable 
> task and executes it on the thread pool.
> 2a. If there are available threads in the pool, one is used
> 2b. If there are no available threads in the pool and all the threads are not 
> in use, then a thread is created and used
> 2c. If there are no available threads in the pool and all the threads are in 
> use, then:
>  - the task is put into a queue (a BlockingQueue)
>  - a thread called Function Execution Processor1 takes the task from that 
> queue and offers it to another queue. This other queue is a SynchronousQueue 
> (an insert waits for a removal). So, basically a thread has to be available 
> for the offer to succeed.
>  - after 5 seconds by default (controlled by gemfire.RETRY_INTERVAL), the 
> offer fails and the rejectedExecutionHandler is invoked. This handler spins 
> off a thread to process that task.
> Once the thread pool is in the state where no threads are available, every 
> new function execution will take at least 5 seconds plus the time it takes to 
> execute the function.
> If MAX_FE_THREADS is 32 and I run a test like:
> - launch 50 ParentFunctions onRegion with a replicated region each of which 
> executes a ChildFunction on the same region
> - launch 1000 (or some number) of other functions that execute quickly
> All 32 threads in the pool will be in use immediately. These threads will be 
> processing ParentFunctions which have invoked the ChildFunction and be 
> waiting for the result. The next 18 (making 50) will cause threads to be spun 
> off after 5 second wait for each. These will also get stuck waiting for the 
> ChildFunctions to execute. The next 1000 will each take 5 seconds to offer, 
> then spin off a thread that executes quickly. These are all processed 
> sequentially. If the function processes quickly enough, it won't show up in 
> thread dumps.
> When the threads pool is in this state, the server will contain threads like 
> below.
> For each client request, there will be a ServerConnection thread waiting for 
> the function execution request to complete here:
> {noformat}
> "ServerConnection on port 62483 Thread 42" #155 daemon prio=5 os_prio=31 
> tid=0x7fdf072b6800 nid=0x15703 waiting on condition [0x700018bb7000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0006c01c1378> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at 
> org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.getResult(LocalResultCollectorImpl.java:110)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:255)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844)
>  at 
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1214)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:593)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$107/881662115.invoke(Unknown
>  Source)
>  at 
> 

[jira] [Assigned] (GEODE-5959) Nested function executions can cause a performance issue

2018-10-30 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby reassigned GEODE-5959:


Assignee: Barry Oglesby

> Nested function executions can cause a performance issue
> 
>
> Key: GEODE-5959
> URL: https://issues.apache.org/jira/browse/GEODE-5959
> Project: Geode
>  Issue Type: Bug
>  Components: functions
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>
> When a client executes a function, the server does:
> 1. The ServerConnection receives the function request, creates a runnable 
> task and executes it on the thread pool.
> 2a. If there are available threads in the pool, one is used
> 2b. If there are no available threads in the pool and all the threads are not 
> in use, then a thread is created and used
> 2c. If there are no available threads in the pool and all the threads are in 
> use, then:
>  - the task is put into a queue (a BlockingQueue)
>  - a thread called Function Execution Processor1 takes the task from that 
> queue and offers it to another queue. This other queue is a SynchronousQueue 
> (an insert waits for a removal). So, basically a thread has to be available 
> for the offer to succeed.
>  - after 5 seconds by default (controlled by gemfire.RETRY_INTERVAL), the 
> offer fails and the rejectedExecutionHandler is invoked. This handler spins 
> off a thread to process that task.
> Once the thread pool is in the state where no threads are available, every 
> new function execution will take at least 5 seconds plus the time it takes to 
> execute the function.
> If MAX_FE_THREADS is 32 and I run a test like:
> - launch 50 ParentFunctions onRegion with a replicated region each of which 
> executes a ChildFunction on the same region
> - launch 1000 (or some number) of other functions that execute quickly
> All 32 threads in the pool will be in use immediately. These threads will be 
> processing ParentFunctions which have invoked the ChildFunction and be 
> waiting for the result. The next 18 (making 50) will cause threads to be spun 
> off after 5 second wait for each. These will also get stuck waiting for the 
> ChildFunctions to execute. The next 1000 will each take 5 seconds to offer, 
> then spin off a thread that executes quickly. These are all processed 
> sequentially. If the function processes quickly enough, it won't show up in 
> thread dumps.
> When the threads pool is in this state, the server will contain threads like 
> below.
> For each client request, there will be a ServerConnection thread waiting for 
> the function execution request to complete here:
> {noformat}
> "ServerConnection on port 62483 Thread 42" #155 daemon prio=5 os_prio=31 
> tid=0x7fdf072b6800 nid=0x15703 waiting on condition [0x700018bb7000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0006c01c1378> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at 
> org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.getResult(LocalResultCollectorImpl.java:110)
>  at 
> org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:255)
>  at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844)
>  at 
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74)
>  at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1214)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:593)
>  at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$107/881662115.invoke(Unknown
>  Source)
>  at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>  at 
> 

[jira] [Created] (GEODE-5959) Nested function executions can cause a performance issue

2018-10-30 Thread Barry Oglesby (JIRA)
Barry Oglesby created GEODE-5959:


 Summary: Nested function executions can cause a performance issue
 Key: GEODE-5959
 URL: https://issues.apache.org/jira/browse/GEODE-5959
 Project: Geode
  Issue Type: Bug
  Components: functions
Reporter: Barry Oglesby


When a client executes a function, the server does:

1. The ServerConnection receives the function request, creates a runnable task 
and executes it on the thread pool.
2a. If there are available threads in the pool, one is used
2b. If there are no available threads in the pool and all the threads are not 
in use, then a thread is created and used
2c. If there are no available threads in the pool and all the threads are in 
use, then:
 - the task is put into a queue (a BlockingQueue)
 - a thread called Function Execution Processor1 takes the task from that queue 
and offers it to another queue. This other queue is a SynchronousQueue (an 
insert waits for a removal). So, basically a thread has to be available for the 
offer to succeed.
 - after 5 seconds by default (controlled by gemfire.RETRY_INTERVAL), the offer 
fails and the rejectedExecutionHandler is invoked. This handler spins off a 
thread to process that task.

Once the thread pool is in the state where no threads are available, every new 
function execution will take at least 5 seconds plus the time it takes to 
execute the function.

If MAX_FE_THREADS is 32 and I run a test like:

- launch 50 ParentFunctions onRegion with a replicated region each of which 
executes a ChildFunction on the same region
- launch 1000 (or some number) of other functions that execute quickly

All 32 threads in the pool will be in use immediately. These threads will be 
processing ParentFunctions which have invoked the ChildFunction and be waiting 
for the result. The next 18 (making 50) will cause threads to be spun off after 
5 second wait for each. These will also get stuck waiting for the 
ChildFunctions to execute. The next 1000 will each take 5 seconds to offer, 
then spin off a thread that executes quickly. These are all processed 
sequentially. If the function processes quickly enough, it won't show up in 
thread dumps.

When the threads pool is in this state, the server will contain threads like 
below.

For each client request, there will be a ServerConnection thread waiting for 
the function execution request to complete here:
{noformat}
"ServerConnection on port 62483 Thread 42" #155 daemon prio=5 os_prio=31 
tid=0x7fdf072b6800 nid=0x15703 waiting on condition [0x700018bb7000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x0006c01c1378> (a 
java.util.concurrent.CountDownLatch$Sync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 at 
org.apache.geode.internal.cache.execute.LocalResultCollectorImpl.getResult(LocalResultCollectorImpl.java:110)
 at 
org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:255)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:178)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:844)
 at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:74)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1214)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:593)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$107/881662115.invoke(Unknown
 Source)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
 at 
org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$18/49222910.run(Unknown
 Source)
 at java.lang.Thread.run(Thread.java:745)
{noformat}
There will be a corresponding Function Execution Processor thread in the middle 
of executing the parent function and waiting for child function execution:
{noformat}
"Function Execution Processor12" #158 daemon prio=5 os_prio=31 
tid=0x7fdf072a6000 nid=0xd707 waiting on condition [0x700014af7000]
 

[jira] [Resolved] (GEODE-5917) Gfsh query results show a mix of PdxInstances and PreferBytesCachedDeserializables with read-serialized=true

2018-10-30 Thread Barry Oglesby (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry Oglesby resolved GEODE-5917.
--
   Resolution: Fixed
Fix Version/s: 1.8.0

> Gfsh query results show a mix of PdxInstances and 
> PreferBytesCachedDeserializables with read-serialized=true
> 
>
> Key: GEODE-5917
> URL: https://issues.apache.org/jira/browse/GEODE-5917
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Reporter: Barry Oglesby
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Gfsh query results show a mix of PdxInstances and 
> PreferBytesCachedDeserializables with read-serialized=true
> A gfsh query on a partitioned region with pdx read-serialized=true shows 
> results like:
> {noformat}
> shares |   price   |   id   | cusip  |  serializedValue   | 
> sizeInBytes | stringForm  
>| valueSizeInBytes | DSFID  |deserializedForReading
>  |   value| serialized
> -- | - | -- | -- | -- | 
> --- | 
> -- | 
>  | -- | - | 
> -- | --
> 70 | 590.923583984375  | 0  | MCD|  | 
>   | 
>  ||  | 
> |  | 
> 77 | 740.6094970703125 | 3  | MGM|  | 
>   | 
>  ||  | 
> |  | 
>  | |  |  | org.json.JSONArray | 56
>   | PDX[4456129,TradePdx]{cusip=GGB, id=1, price=26.52454376220703, 
> shares=49} | 44   | -65| 
> org.apache.geode.pdx.internal.PdxInstanceImpl | org.json.JSONArray | true
>  | |  |  | org.json.JSONArray | 56
>   | PDX[4456129,TradePdx]{cusip=STO, id=2, price=643.344482421875, shares=85} 
>  | 44   | -65| org.apache.geode.pdx.internal.PdxInstanceImpl 
> | org.json.JSONArray | true
>  | |  |  | org.json.JSONArray | 56
>   | PDX[4456129,TradePdx]{cusip=MGM, id=4, price=724.223388671875, shares=0}  
>  | 44   | -65| org.apache.geode.pdx.internal.PdxInstanceImpl 
> | org.json.JSONArray | true
> {noformat}
> In this case, there are 2 servers and no redundant copies.
> The DataCommandFunction.select query returns:
> {noformat}
> DataCommandFunction.select results=CumulativeNonDistinctResults::[
> PDX[4456129,TradePdx]{cusip=MCD, id=0, price=590.923583984375, shares=70},
> PDX[4456129,TradePdx]{cusip=MGM, id=3, price=740.6094970703125, shares=77},
> PreferBytesCachedDeserializable@1599752189,
> PreferBytesCachedDeserializable@1120782877,
> PreferBytesCachedDeserializable@1023583807
> ]
> {noformat}
> The local query returns the 2 PdxInstances, and the remote query returns the 
> 3 PreferBytesCachedDeserializables:
> {noformat}
> [info 2018/10/23 13:53:18.046 PDT   tid=0x4f] 
> Trace Info for Query: SELECT * FROM /data limit 100
> Local 192.168.2.6(76490):1026 took 6.887ms and returned 2 results; 
> Remote 192.168.2.6(76479):1025 took 45.164ms and returned 3 results;  
> indexesUsed(0)
> {noformat}
> The 3 PreferBytesCachedDeserializables are not converted to PdxInstances 
> before they are returned.
> PartitionedRegionQueryEvaluator.addResultsToResultSet adds the results to the 
> CumulativeNonDistinctResults result set.
> The CumulativeCollectionIterator iterates the 
> CumulativeNonDistinctResultsCollection and converts the objects to PDX here:
> {noformat}
> java.lang.Exception: Stack trace
>   at java.lang.Thread.dumpStack(Thread.java:1333)
>   at 
> org.apache.geode.cache.query.internal.utils.PDXUtils.convertPDX(PDXUtils.java:83)
>   at 
> org.apache.geode.cache.query.internal.CumulativeNonDistinctResults$CumulativeNonDistinctResultsCollection$CumulativeCollectionIterator.next(CumulativeNonDistinctResults.java:259)
>   at 
> org.apache.geode.cache.query.internal.utils.LimitIterator.next(LimitIterator.java:49)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.select_SelectResults(DataCommandFunction.java:271)
>   at 
> org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:226)
>   at 
> 

  1   2   3   >