[jira] [Reopened] (GEODE-7091) Add Micrometer binders to default meter registry

2019-08-16 Thread Aaron Lindsey (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Lindsey reopened GEODE-7091:
--

Still adding tests for this.

> Add Micrometer binders to default meter registry
> 
>
> Key: GEODE-7091
> URL: https://issues.apache.org/jira/browse/GEODE-7091
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Aaron Lindsey
>Assignee: Aaron Lindsey
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As a user, there are specific JVM metrics, GC metrics, Uptime, and 
> FileDescriptor metrics that help indicate and track down issues with health 
> of the cluster, that I want to access in order to understand the health of my 
> cluster.
> Add the following Micrometer binders:
> * JvmGcMetrics
> * ProcessorMetrics
> * JvmThreadMetrics
> * UptimeMetrics
> * FileDescriptorMetrics



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (GEODE-7099) Clean up MeterSubregistryReconnectDistributedTest

2019-08-16 Thread Dale Emery (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Emery reassigned GEODE-7099:
-

Assignee: Dale Emery

> Clean up MeterSubregistryReconnectDistributedTest
> -
>
> Key: GEODE-7099
> URL: https://issues.apache.org/jira/browse/GEODE-7099
> Project: Geode
>  Issue Type: Test
>  Components: statistics
>Reporter: Dale Emery
>Assignee: Dale Emery
>Priority: Major
>
> Improve readability of the test.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GEODE-7099) Clean up MeterSubregistryReconnectDistributedTest

2019-08-16 Thread Dale Emery (JIRA)
Dale Emery created GEODE-7099:
-

 Summary: Clean up MeterSubregistryReconnectDistributedTest
 Key: GEODE-7099
 URL: https://issues.apache.org/jira/browse/GEODE-7099
 Project: Geode
  Issue Type: Test
  Components: statistics
Reporter: Dale Emery


Improve readability of the test.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7092) ReconnectWithUDPSecurityDUnitTest fails due to insufficient MEMBER_TIMEOUT

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909482#comment-16909482
 ] 

ASF subversion and git services commented on GEODE-7092:


Commit 97aa017c324b74fe53bf8203c2631a836b9714c5 in geode's branch 
refs/heads/develop from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=97aa017 ]

GEODE-7092: increase flaky test MEMBER_TIMEOUT



> ReconnectWithUDPSecurityDUnitTest fails due to insufficient MEMBER_TIMEOUT
> --
>
> Key: GEODE-7092
> URL: https://issues.apache.org/jira/browse/GEODE-7092
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Flaky test. In this CI run:
>  
> [https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK8/builds/879]
>  
> These two tests failed:
>  
> ReconnectWithUDPSecurityDUnitTest > testReconnectOnForcedDisconnect
> ReconnectWithUDPSecurityDUnitTest > testReconnectCollidesWithApplication
>  
> They both boil down to:
>  
> doTestReconnectOnForcedDisconnect() which calls 
> getDistributedSystemProperties() to get property values to set. 
> MEMBER_TIMEOUT is set to 1 second which is insufficient.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (GEODE-4264) GEODE-4264 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. testRMButDisabledQueryMonitorForLowMemAndTimeoutSetOnServer

2019-08-16 Thread Lynn Gallinat (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lynn Gallinat reopened GEODE-4264:
--

org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest > 
testRMButDisabledQueryMonitorForLowMemAndNoTimeoutSetOnServer FAILED
java.lang.AssertionError: queryExecution.getResult() threw Exception 
java.lang.AssertionError: An exception occurred during asynchronous invocation.
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:837)
at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTestOnServer(ResourceManagerWithQueryMonitorDUnitTest.java:725)
at 
org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testRMButDisabledQueryMonitorForLowMemAndNoTimeoutSetOnServer(ResourceManagerWithQueryMonitorDUnitTest.java:221)

> GEODE-4264 : [CI Failure] ResourceManagerWithQueryMonitorDUnitTest. 
> testRMButDisabledQueryMonitorForLowMemAndTimeoutSetOnServer
> ---
>
> Key: GEODE-4264
> URL: https://issues.apache.org/jira/browse/GEODE-4264
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Reporter: nabarun
>Priority: Major
>
> *Stacktrace*
> {noformat}
> java.lang.AssertionError: queryExecution.getResult() threw Exception 
> java.lang.AssertionError: An exception occurred during asynchronous 
> invocation.
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doTestCriticalHeapAndQueryTimeout(ResourceManagerWithQueryMonitorDUnitTest.java:738)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.doCriticalMemoryHitTestOnServer(ResourceManagerWithQueryMonitorDUnitTest.java:681)
>   at 
> org.apache.geode.cache.query.dunit.ResourceManagerWithQueryMonitorDUnitTest.testRMButDisabledQueryMonitorForLowMemAndTimeoutSetOnServer(ResourceManagerWithQueryMonitorDUnitTest.java:209)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

[jira] [Updated] (GEODE-7080) EntryDestroyedException can be thrown during exportOfflineSnapshot if a key was destroyed

2019-08-16 Thread Eric Shu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Shu updated GEODE-7080:

Fix Version/s: (was: 1.11.0)
   1.10.0

> EntryDestroyedException can be thrown during exportOfflineSnapshot if a key 
> was destroyed
> -
>
> Key: GEODE-7080
> URL: https://issues.apache.org/jira/browse/GEODE-7080
> Project: Geode
>  Issue Type: Bug
>  Components: snapshot
>Affects Versions: 1.1.0
>Reporter: Eric Shu
>Assignee: Eric Shu
>Priority: Major
> Fix For: 1.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ExportOfflineSnapshot is aborted if the exception is thrown.
> org.apache.geode.cache.EntryDestroyedException
> at 
> org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.convertToBytes(SnapshotPacket.java:161)
> at 
> org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.(SnapshotPacket.java:62)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$2.writeBatch(DiskStoreImpl.java:3849)
> at 
> org.apache.geode.internal.cache.ExportDiskRegion.oplogRecovered(ExportDiskRegion.java:67)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:468)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:367)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2043)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.exportSnapshot(DiskStoreImpl.java:3855)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.exportOfflineSnapshot(DiskStoreImpl.java:4147)
> at 
> org.apache.geode.management.internal.cli.commands.ExportOfflineDiskStoreCommand.exportOfflineDiskStore(ExportOfflineDiskStoreCommand.java:53)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7080) EntryDestroyedException can be thrown during exportOfflineSnapshot if a key was destroyed

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909427#comment-16909427
 ] 

ASF subversion and git services commented on GEODE-7080:


Commit fbe8ef58abd1b9cd73eb1fd4b47a08f8c4aa5515 in geode's branch 
refs/heads/release/1.10.0 from Eric Shu
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=fbe8ef5 ]

GEODE-7080: Do not add removed entry into offline export file. (#3912)

* GEODE-7080: Do not add removed entry into offline export file.

  Co-authored-by: Donal Evans doev...@pivotal.io

  (cherry picked from commit aa33060fd099254b08a659e9d8267a9b3c236b91)


> EntryDestroyedException can be thrown during exportOfflineSnapshot if a key 
> was destroyed
> -
>
> Key: GEODE-7080
> URL: https://issues.apache.org/jira/browse/GEODE-7080
> Project: Geode
>  Issue Type: Bug
>  Components: snapshot
>Affects Versions: 1.1.0
>Reporter: Eric Shu
>Assignee: Eric Shu
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ExportOfflineSnapshot is aborted if the exception is thrown.
> org.apache.geode.cache.EntryDestroyedException
> at 
> org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.convertToBytes(SnapshotPacket.java:161)
> at 
> org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.(SnapshotPacket.java:62)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$2.writeBatch(DiskStoreImpl.java:3849)
> at 
> org.apache.geode.internal.cache.ExportDiskRegion.oplogRecovered(ExportDiskRegion.java:67)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:468)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:367)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2043)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.exportSnapshot(DiskStoreImpl.java:3855)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.exportOfflineSnapshot(DiskStoreImpl.java:4147)
> at 
> org.apache.geode.management.internal.cli.commands.ExportOfflineDiskStoreCommand.exportOfflineDiskStore(ExportOfflineDiskStoreCommand.java:53)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7080) EntryDestroyedException can be thrown during exportOfflineSnapshot if a key was destroyed

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909426#comment-16909426
 ] 

ASF subversion and git services commented on GEODE-7080:


Commit fbe8ef58abd1b9cd73eb1fd4b47a08f8c4aa5515 in geode's branch 
refs/heads/release/1.10.0 from Eric Shu
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=fbe8ef5 ]

GEODE-7080: Do not add removed entry into offline export file. (#3912)

* GEODE-7080: Do not add removed entry into offline export file.

  Co-authored-by: Donal Evans doev...@pivotal.io

  (cherry picked from commit aa33060fd099254b08a659e9d8267a9b3c236b91)


> EntryDestroyedException can be thrown during exportOfflineSnapshot if a key 
> was destroyed
> -
>
> Key: GEODE-7080
> URL: https://issues.apache.org/jira/browse/GEODE-7080
> Project: Geode
>  Issue Type: Bug
>  Components: snapshot
>Affects Versions: 1.1.0
>Reporter: Eric Shu
>Assignee: Eric Shu
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ExportOfflineSnapshot is aborted if the exception is thrown.
> org.apache.geode.cache.EntryDestroyedException
> at 
> org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.convertToBytes(SnapshotPacket.java:161)
> at 
> org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.(SnapshotPacket.java:62)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$2.writeBatch(DiskStoreImpl.java:3849)
> at 
> org.apache.geode.internal.cache.ExportDiskRegion.oplogRecovered(ExportDiskRegion.java:67)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:468)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:367)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2043)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.exportSnapshot(DiskStoreImpl.java:3855)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.exportOfflineSnapshot(DiskStoreImpl.java:4147)
> at 
> org.apache.geode.management.internal.cli.commands.ExportOfflineDiskStoreCommand.exportOfflineDiskStore(ExportOfflineDiskStoreCommand.java:53)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7091) Add Micrometer binders to default meter registry

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909390#comment-16909390
 ] 

ASF subversion and git services commented on GEODE-7091:


Commit c2e1fab0427a252b88145203d9ada2fb918ff62b in geode's branch 
refs/heads/develop from Aaron Lindsey
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c2e1fab ]

GEODE-7091: Add Micrometer binders to meter registry

Add the following Micrometer binders:
- JvmGcMetrics
- ProcessorMetrics
- JvmThreadMetrics
- UptimeMetrics
- FileDescriptorMetrics

Ignore FileDescriptorMetrics binder tests on Windows

Co-authored-by: Aaron Lindsey 
Co-authored-by: Kirk Lund 


> Add Micrometer binders to default meter registry
> 
>
> Key: GEODE-7091
> URL: https://issues.apache.org/jira/browse/GEODE-7091
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Aaron Lindsey
>Assignee: Aaron Lindsey
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As a user, there are specific JVM metrics, GC metrics, Uptime, and 
> FileDescriptor metrics that help indicate and track down issues with health 
> of the cluster, that I want to access in order to understand the health of my 
> cluster.
> Add the following Micrometer binders:
> * JvmGcMetrics
> * ProcessorMetrics
> * JvmThreadMetrics
> * UptimeMetrics
> * FileDescriptorMetrics



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GEODE-6613) ClientServerTransactionFailoverDistributedTest multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations is flaky

2019-08-16 Thread Eric Shu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Shu resolved GEODE-6613.
-
   Resolution: Fixed
Fix Version/s: 1.11.0

> ClientServerTransactionFailoverDistributedTest 
> multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations 
> is flaky
> -
>
> Key: GEODE-6613
> URL: https://issues.apache.org/jira/browse/GEODE-6613
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, transactions
>Reporter: Darrel Schneider
>Assignee: Eric Shu
>Priority: Major
>  Labels: GeodeCommons, flaky
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ClientServerTransactionFailoverDistributedTest 
> multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations 
> has recently started failing once in a while.
> This may have been caused by: GEODE-6515
> When it fails it us because at least one of the transactions changes do not 
> happen. Here is an example of how it fails:
> {noformat}
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest
>  > 
> multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations 
> FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest$$Lambda$144/1667766308.run
>  in VM 3 running on Host 671fd56a424e with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations(ClientServerTransactionFailoverDistributedTest.java:294)
> Caused by:
> org.junit.ComparisonFailure: expected:<[value1]> but 
> was:<[originalValue]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.verifyTransactionResult(ClientServerTransactionFailoverDistributedTest.java:222)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.lambda$multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations$2967fbd$1(ClientServerTransactionFailoverDistributedTest.java:294){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-6613) ClientServerTransactionFailoverDistributedTest multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations is flaky

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909344#comment-16909344
 ] 

ASF subversion and git services commented on GEODE-6613:


Commit e5540724ea9000b789c01d4073504f84b191e055 in geode's branch 
refs/heads/develop from Eric Shu
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=e554072 ]

GEODE-6613: Do not set TransactionTimeout in the test. (#3934)



> ClientServerTransactionFailoverDistributedTest 
> multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations 
> is flaky
> -
>
> Key: GEODE-6613
> URL: https://issues.apache.org/jira/browse/GEODE-6613
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, transactions
>Reporter: Darrel Schneider
>Assignee: Eric Shu
>Priority: Major
>  Labels: GeodeCommons, flaky
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ClientServerTransactionFailoverDistributedTest 
> multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations 
> has recently started failing once in a while.
> This may have been caused by: GEODE-6515
> When it fails it us because at least one of the transactions changes do not 
> happen. Here is an example of how it fails:
> {noformat}
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest
>  > 
> multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations 
> FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest$$Lambda$144/1667766308.run
>  in VM 3 running on Host 671fd56a424e with 4 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations(ClientServerTransactionFailoverDistributedTest.java:294)
> Caused by:
> org.junit.ComparisonFailure: expected:<[value1]> but 
> was:<[originalValue]>
> at org.junit.Assert.assertEquals(Assert.java:115)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.verifyTransactionResult(ClientServerTransactionFailoverDistributedTest.java:222)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverDistributedTest.lambda$multipleClientLongTransactionsCanFailoverMultipleTimesWithoutLosingOperations$2967fbd$1(ClientServerTransactionFailoverDistributedTest.java:294){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7091) Add Micrometer binders to default meter registry

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909277#comment-16909277
 ] 

ASF subversion and git services commented on GEODE-7091:


Commit 2b6e6c09e530297a97d564f215394a210986a130 in geode's branch 
refs/heads/feature/GEODE-7066 from Kirk Lund
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=2b6e6c0 ]

Revert "GEODE-7091: Add Micrometer binders to meter registry"

This reverts commit d0585f499c8865085f510cd948f5560fe7554655.


> Add Micrometer binders to default meter registry
> 
>
> Key: GEODE-7091
> URL: https://issues.apache.org/jira/browse/GEODE-7091
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Aaron Lindsey
>Assignee: Aaron Lindsey
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As a user, there are specific JVM metrics, GC metrics, Uptime, and 
> FileDescriptor metrics that help indicate and track down issues with health 
> of the cluster, that I want to access in order to understand the health of my 
> cluster.
> Add the following Micrometer binders:
> * JvmGcMetrics
> * ProcessorMetrics
> * JvmThreadMetrics
> * UptimeMetrics
> * FileDescriptorMetrics



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7066) Events can be lost in a gateway batch containing duplicate non-conflatable events with conflation enabled

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909278#comment-16909278
 ] 

ASF subversion and git services commented on GEODE-7066:


Commit f13e5493526b8fd803c5e02125b37bcffca50c65 in geode's branch 
refs/heads/feature/GEODE-7066 from Barry Oglesby
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=f13e549 ]

GEODE-7066: Modified batch conflation to use event id instead of shadow key


> Events can be lost in a gateway batch containing duplicate non-conflatable 
> events with conflation enabled
> -
>
> Key: GEODE-7066
> URL: https://issues.apache.org/jira/browse/GEODE-7066
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.9.0
>Reporter: Barry Oglesby
>Assignee: Barry Oglesby
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a batch contains duplicate CREATE and DESTROY events on key 1736 like 
> below and conflation is enabled, the earlier events will be overwritten by 
> the later events.
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6009];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|5;sequenceID=6011];operation=DESTROY;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
> GatewaySenderEventImpl[id=EventID[id=31bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> The batch will look like this after conflation:
> {noformat}
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6072];operation=CREATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6073];operation=UPDATE;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6087];operation=CREATE;region=/SESSIONS;key=1736],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6074];operation=DESTROY;region=/SESSIONS;key=6079],
>  
> GatewaySenderEventImpl[id=EventID[id=31 
> bytes;threadID=0x30004|6;sequenceID=6089];operation=DESTROY;region=/SESSIONS;key=1736]
> {noformat}
> All the events from threadID=0x30004|5 are gone.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7091) Add Micrometer binders to default meter registry

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909276#comment-16909276
 ] 

ASF subversion and git services commented on GEODE-7091:


Commit d0585f499c8865085f510cd948f5560fe7554655 in geode's branch 
refs/heads/feature/GEODE-7066 from Aaron Lindsey
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d0585f4 ]

GEODE-7091: Add Micrometer binders to meter registry

Add the following Micrometer binders:
- JvmGcMetrics
- ProcessorMetrics
- JvmThreadMetrics
- UptimeMetrics
- FileDescriptorMetrics

Co-authored-by: Aaron Lindsey 
Co-authored-by: Kirk Lund 


> Add Micrometer binders to default meter registry
> 
>
> Key: GEODE-7091
> URL: https://issues.apache.org/jira/browse/GEODE-7091
> Project: Geode
>  Issue Type: Improvement
>  Components: statistics
>Reporter: Aaron Lindsey
>Assignee: Aaron Lindsey
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As a user, there are specific JVM metrics, GC metrics, Uptime, and 
> FileDescriptor metrics that help indicate and track down issues with health 
> of the cluster, that I want to access in order to understand the health of my 
> cluster.
> Add the following Micrometer binders:
> * JvmGcMetrics
> * ProcessorMetrics
> * JvmThreadMetrics
> * UptimeMetrics
> * FileDescriptorMetrics



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909273#comment-16909273
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch 
refs/heads/feature/GEODE-7066 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7072) CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > EventProcessingMixedSiteOneCurrentSiteTwo[from_v130] FAILED

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909272#comment-16909272
 ] 

ASF subversion and git services commented on GEODE-7072:


Commit 86fd74db98b5dff0e92ea4985651f3955c1a3420 in geode's branch 
refs/heads/feature/GEODE-7066 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=86fd74d ]

GEODE-7072 CI Failure: 
WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo (#3908)

* GEODE-7072 CI Failure: 
WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo

A number of tests were attempting to delete old locator state files
containing membership views in order to ensure that artifacts from
previously run tests were not around to infect the current test.

Unfortunately the calls to DistributedTestUtils.deleteLocatorStateFile()
were being made from the wrong working directory.  Instead of looking
for the file(s) in the directory that the test's locator would use they
were looking in the unit test VMs working directory.

* adding another test


> CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > 
> EventProcessingMixedSiteOneCurrentSiteTwo[from_v130] FAILED
> 
>
> Key: GEODE-7072
> URL: https://issues.apache.org/jira/browse/GEODE-7072
> Project: Geode
>  Issue Type: Test
>  Components: wan
>Reporter: Owen Nichols
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo
>  > EventProcessingMixedSiteOneCurrentSiteTwo[from_v130] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo$$Lambda$47/1509632157.run
>  in VM 0 running on Host aac3b458d9ea with 7 VMs with version 130
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.EventProcessingMixedSiteOneCurrentSiteTwo(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:63)
> Caused by:
> org.apache.geode.InternalGemFireException: Unable to recover previous 
> membership view from locator26547view.dat
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:462)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recover(GMSLocator.java:387)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.init(GMSLocator.java:146)
> at 
> org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.init(InternalLocator.java:1225)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:232)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startTcpServer(InternalLocator.java:517)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:575)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:321)
> at 
> org.apache.geode.distributed.Locator.startLocator(Locator.java:253)
> at 
> org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:140)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:105)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:97)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.lambda$EventProcessingMixedSiteOneCurrentSiteTwo$6f8ee815$1(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:63)
> Caused by:
> org.apache.geode.SerializationException: Could not create an 
> instance of  org.apache.geode.distributed.internal.membership.NetView .
> at 
> org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2381)
> at 
> org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:986)
> at 
> org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2693)
> at 
> org.apache.geode.DataSerializer.readObject(DataSerializer.java:2961)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:440)

[jira] [Commented] (GEODE-6945) geode-managment should create its own set of configuration objects for IndexConfiguration

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909275#comment-16909275
 ] 

ASF subversion and git services commented on GEODE-6945:


Commit 550e19e9c9bfd147a387c56019f00dbf162a2b26 in geode's branch 
refs/heads/feature/GEODE-7066 from Jinmei Liao
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=550e19e ]

GEODE-6945:geode-managment should create its own set of configuration… (#3928)

Co-authored-by: Darrel Schneider 

* do not use xml domain object for region configuration
* add RegionType.UNSUPPORTED


> geode-managment should create its own set of configuration objects for 
> IndexConfiguration
> -
>
> Key: GEODE-6945
> URL: https://issues.apache.org/jira/browse/GEODE-6945
> Project: Geode
>  Issue Type: Sub-task
>  Components: management
>Reporter: Jinmei Liao
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> IndexConfiguration
> only add the supported attributes
> modify the implementation of ConfigurationManager to do the bridging between 
> the configuration objects and xml domain objects



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909274#comment-16909274
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch 
refs/heads/feature/GEODE-7066 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-6945) geode-managment should create its own set of configuration objects for IndexConfiguration

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909266#comment-16909266
 ] 

ASF subversion and git services commented on GEODE-6945:


Commit 374eff722708947570eb9367fc626181a1c9f4ce in geode's branch 
refs/heads/feature/GEODE-7066 from Jinmei Liao
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=374eff7 ]

GEODE-6945:geode-managment should create its own set of configuration… (#3923)

Co-authored-by: Darrel Schneider 

* remove regionName in Index configuration object, use regionPath to infer it.
* support listing indexes for all regions


> geode-managment should create its own set of configuration objects for 
> IndexConfiguration
> -
>
> Key: GEODE-6945
> URL: https://issues.apache.org/jira/browse/GEODE-6945
> Project: Geode
>  Issue Type: Sub-task
>  Components: management
>Reporter: Jinmei Liao
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> IndexConfiguration
> only add the supported attributes
> modify the implementation of ConfigurationManager to do the bridging between 
> the configuration objects and xml domain objects



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7079) NPE Upon Restart When Using Asynchronous Event Distribution & Conflation

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909267#comment-16909267
 ] 

ASF subversion and git services commented on GEODE-7079:


Commit 6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b in geode's branch 
refs/heads/feature/GEODE-7066 from Juan José Ramos
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6f4bbbd ]

GEODE-7079: Prevent NPE During Queue Conflation (#3911)

* GEODE-7079: Prevent NPE During Queue Conflation

- Added tests.
- Fixed minor warnings.
- Use the cached region name when doing conflation instead of the actual region 
so the processor doesn't need to wait for the actual region to be fully 
initialized.

Co-authored-by: Benjamin Ross 


> NPE Upon Restart When Using Asynchronous Event Distribution & Conflation
> 
>
> Key: GEODE-7079
> URL: https://issues.apache.org/jira/browse/GEODE-7079
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Juan José Ramos Cassella
>Assignee: Juan José Ramos Cassella
>Priority: Major
>  Labels: GeodeCommons
> Fix For: 1.11.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The following combination of features cannot be safely configured when using 
> pure Geode Java API:
> * Replicated Region.
> * Serial Gateway Sender or Serial Asynchronous Event Queue.
> * Serial Gateway Sender / Serial Asynchronous Event Queue is Persistent.
> * Conflation is Enabled for the Serial Gateway Sender / Serial Asynchronous 
> Event Queue.
> The problem is that, after a restart, events left-over in the persistent 
> queue begin processing before their source {{Region}} is instantiated, 
> causing a {{NullPointerExceptions}} while executing the conflation logic.
> The {{Region}} is only required because internally we need its name, but the 
> name itself is already stored within the actual event so it should be safe to 
> replace {{gsEvent.getRegion().getFullPath()}} by 
> {{gsEvent.getRegionToConflate()}} or {{gsEvent.getRegionPath()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7072) CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > EventProcessingMixedSiteOneCurrentSiteTwo[from_v130] FAILED

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909271#comment-16909271
 ] 

ASF subversion and git services commented on GEODE-7072:


Commit 86fd74db98b5dff0e92ea4985651f3955c1a3420 in geode's branch 
refs/heads/feature/GEODE-7066 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=86fd74d ]

GEODE-7072 CI Failure: 
WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo (#3908)

* GEODE-7072 CI Failure: 
WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo

A number of tests were attempting to delete old locator state files
containing membership views in order to ensure that artifacts from
previously run tests were not around to infect the current test.

Unfortunately the calls to DistributedTestUtils.deleteLocatorStateFile()
were being made from the wrong working directory.  Instead of looking
for the file(s) in the directory that the test's locator would use they
were looking in the unit test VMs working directory.

* adding another test


> CI Failure: WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo > 
> EventProcessingMixedSiteOneCurrentSiteTwo[from_v130] FAILED
> 
>
> Key: GEODE-7072
> URL: https://issues.apache.org/jira/browse/GEODE-7072
> Project: Geode
>  Issue Type: Test
>  Components: wan
>Reporter: Owen Nichols
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo
>  > EventProcessingMixedSiteOneCurrentSiteTwo[from_v130] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo$$Lambda$47/1509632157.run
>  in VM 0 running on Host aac3b458d9ea with 7 VMs with version 130
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.EventProcessingMixedSiteOneCurrentSiteTwo(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:63)
> Caused by:
> org.apache.geode.InternalGemFireException: Unable to recover previous 
> membership view from locator26547view.dat
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:462)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recover(GMSLocator.java:387)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.init(GMSLocator.java:146)
> at 
> org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.init(InternalLocator.java:1225)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:232)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startTcpServer(InternalLocator.java:517)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:575)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:321)
> at 
> org.apache.geode.distributed.Locator.startLocator(Locator.java:253)
> at 
> org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:140)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:105)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.startLocator(WANRollingUpgradeDUnitTest.java:97)
> at 
> org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.lambda$EventProcessingMixedSiteOneCurrentSiteTwo$6f8ee815$1(WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo.java:63)
> Caused by:
> org.apache.geode.SerializationException: Could not create an 
> instance of  org.apache.geode.distributed.internal.membership.NetView .
> at 
> org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2381)
> at 
> org.apache.geode.internal.DSFIDFactory.create(DSFIDFactory.java:986)
> at 
> org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2693)
> at 
> org.apache.geode.DataSerializer.readObject(DataSerializer.java:2961)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:440)

[jira] [Commented] (GEODE-7062) CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909269#comment-16909269
 ] 

ASF subversion and git services commented on GEODE-7062:


Commit c8837668182452e0083c36072cfe115287130e99 in geode's branch 
refs/heads/feature/GEODE-7066 from Juan José Ramos
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c883766 ]

GEODE-7062: Fix Race Condition in DUnit Test (#3897)

- Use Awaitility instead of Thread.sleep to make sure the asynchronous
  task is alive before continuing with the test.

> CI Failure: 
> DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
> 
>
> Key: GEODE-7062
> URL: https://issues.apache.org/jira/browse/GEODE-7062
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Juan José Ramos Cassella
>Assignee: Juan José Ramos Cassella
>Priority: Major
>  Labels: GeodeCommons, flaky-test
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test {{testSuspendLockingBlocksUntilNoLocks}} from class 
> {{DistributedLockServiceDUnitTest}} failed twice in CI runs 
> [967|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/967]
>  and 
> [969|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/969].
> Results for the first failure are available 
> [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565222926/]
>  and for the second one 
> [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565246507/].
> Archived artifacts for the first failure are available 
> [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565222926/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz]
>  and for the second one 
> [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565246507/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz].
> The issue appears to be a race condition while firing an asynchronous thread 
> on a remote {{VM}} through the following code:
> {code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
> VM vm1 = getVM(1);
> vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
>   @Override
>   public void run() {
> DistributedLockService service2 = getServiceNamed(name);
> assertThat(service2.lock("lock", -1, -1)).isTrue();
> synchronized (monitor) {
>   try {
> monitor.wait();
>   } catch (InterruptedException ex) {
> out.println("Unexpected InterruptedException");
> fail("interrupted");
>   }
> }
> service2.unlock("lock");
>   }
> });
> // Let vm1's thread get the lock and go into wait()
> sleep(100);
> {code}
> If the thread is not launched on the remote {{VM}} after sleeping for 100 
> milliseconds, the test will fail as the thread on the local {{VM}} will be 
> able to invoke {{suspendLocking}} right away:
> {code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
> Thread thread = new Thread(new Runnable() {
>   @Override
>   public void run() {
> setGot(service.suspendLocking(-1));
> setDone(true);
> service.resumeLocking();
>   }
> });
> setGot(false);
> setDone(false);
> thread.start();
> // Let thread start, make sure it's blocked in suspendLocking
> sleep(100);
> assertThat(getGot() || getDone())
> .withFailMessage("Before release, got: " + getGot() + ", done: " + 
> getDone()).isFalse();
> {code}
> Increasing the sleep time might help to reduce possible re occurrences of the 
> issue, another option would be to investigate how to make the test wait 
> *unti* the asynchronous invocation has been started on the remote {{VM}} 
> instead of arbitrarily sleeping 100 milliseconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7085) Cannot recover from disk store if region version is greater than Integer.MAX_VALUE

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909270#comment-16909270
 ] 

ASF subversion and git services commented on GEODE-7085:


Commit f58710116db1cd8c509b59a43ffa050a073234d7 in geode's branch 
refs/heads/feature/GEODE-7066 from Dan Smith
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=f587101 ]

 GEODE-7085: Ensuring the bitset stays within BIT_SET_WIDTH (#3922)

Ensuring that when we call recordVersion on a RegionVersionHolder,
we appropriately move the bitSet to match the new version we are
recording, rather than trying to expand it. In particular, if new
version is greater than Integer.MAX_VALUE, we can't record than in out
integer indexed bit set.

This change rewrites addBitSetExceptions. The logic is now broken into a
BitSetExceptionIterator, which converts some or all of the bit set into
RVVException objects, and the logic to slide the bit set forward to a
new bitSetVersion.

Adding unit tests that show that large versions cause an
IndexOutOfBounds exception from recordGCVersion. Adding more unit tests
for the internal state of the bitset.

> Cannot recover from disk store if region version is greater than 
> Integer.MAX_VALUE
> --
>
> Key: GEODE-7085
> URL: https://issues.apache.org/jira/browse/GEODE-7085
> Project: Geode
>  Issue Type: Bug
>  Components: membership, persistence
>Reporter: Dan Smith
>Assignee: Dan Smith
>Priority: Major
> Fix For: 1.11.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We hit an issue where a member failed to recover due to a 
> IndexOutOfBoundsException while recording a version during recovery.
> Looking closer, it looks like the issue is due to the fact that a 
> RegionVersionHolder cannot record a version greater than Integer.MAX_VALUE if 
> it just just constructed.
> When we are recovering from disk, the first thing we read from is the .drf 
> files. The first thing in those drf files is RVV information. We read the RVV 
> records and call recordRecoveredGCVersion.
> When that call gets down inside RegionVersionHolder.recordVersion, there is 
> some logic that is supposed to flush out the bitSet and advance the 
> bitSetVersion. Unfortunately it looks like flushBitSetDuringRecording is not 
> actually doing that. So if version we read from disk is greater than 
> Integer.MAX_VALUE, we wrap around and try to set a negative index in the 
> bitset.
> I can reproduce this with a unit test of RegionVersionVector that records a 
> version greater than Integer.MAX_VALUE. I’m looking into how to fix the 
> flushBitSetDuringRecording method.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7079) NPE Upon Restart When Using Asynchronous Event Distribution & Conflation

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909268#comment-16909268
 ] 

ASF subversion and git services commented on GEODE-7079:


Commit 6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b in geode's branch 
refs/heads/feature/GEODE-7066 from Juan José Ramos
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6f4bbbd ]

GEODE-7079: Prevent NPE During Queue Conflation (#3911)

* GEODE-7079: Prevent NPE During Queue Conflation

- Added tests.
- Fixed minor warnings.
- Use the cached region name when doing conflation instead of the actual region 
so the processor doesn't need to wait for the actual region to be fully 
initialized.

Co-authored-by: Benjamin Ross 


> NPE Upon Restart When Using Asynchronous Event Distribution & Conflation
> 
>
> Key: GEODE-7079
> URL: https://issues.apache.org/jira/browse/GEODE-7079
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Juan José Ramos Cassella
>Assignee: Juan José Ramos Cassella
>Priority: Major
>  Labels: GeodeCommons
> Fix For: 1.11.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The following combination of features cannot be safely configured when using 
> pure Geode Java API:
> * Replicated Region.
> * Serial Gateway Sender or Serial Asynchronous Event Queue.
> * Serial Gateway Sender / Serial Asynchronous Event Queue is Persistent.
> * Conflation is Enabled for the Serial Gateway Sender / Serial Asynchronous 
> Event Queue.
> The problem is that, after a restart, events left-over in the persistent 
> queue begin processing before their source {{Region}} is instantiated, 
> causing a {{NullPointerExceptions}} while executing the conflation logic.
> The {{Region}} is only required because internally we need its name, but the 
> name itself is already stored within the actual event so it should be safe to 
> replace {{gsEvent.getRegion().getFullPath()}} by 
> {{gsEvent.getRegionToConflate()}} or {{gsEvent.getRegionPath()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (GEODE-7098) Tomcat8SessionsClientServerDUnitTest fails with ConnectException

2019-08-16 Thread Benjamin P Ross (JIRA)
Benjamin P Ross created GEODE-7098:
--

 Summary: Tomcat8SessionsClientServerDUnitTest fails with 
ConnectException
 Key: GEODE-7098
 URL: https://issues.apache.org/jira/browse/GEODE-7098
 Project: Geode
  Issue Type: Bug
Reporter: Benjamin P Ross


We've seen Tomcat8SessionsClientServerDUnitTest.testExtraSessionsNotCreated 
fail with
a ConnectException

org.apache.geode.modules.session.Tomcat8SessionsClientServerDUnitTest > 
testExtraSessionsNotCreated FAILED
java.net.ConnectException: Connection refused (Connection refused)

Caused by:
java.net.ConnectException: Connection refused (Connection refused)

This could be an environmental error, but if a pattern develops it could be due 
to a flaky test.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7097) ParallelGatewaySenderOperationsDUnitTest > testParallelPropagationSenderStop fails due to DistributedSystemDisconnectedException

2019-08-16 Thread nabarun (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909237#comment-16909237
 ] 

nabarun commented on GEODE-7097:


At this moment, we are pointing it a resource issue on the machine running the 
test. 
 * We are ignoring this failure at the moment
 * It will be prioritized if this is reproduced.

> ParallelGatewaySenderOperationsDUnitTest > testParallelPropagationSenderStop 
> fails due to DistributedSystemDisconnectedException
> 
>
> Key: GEODE-7097
> URL: https://issues.apache.org/jira/browse/GEODE-7097
> Project: Geode
>  Issue Type: Bug
>Reporter: Bill Burcham
>Priority: Major
>
> In this build: 
> https://concourse.gemfire-ci.info/teams/main/pipelines/gemfire-develop-main/jobs/DistributedTestOpenJDK8/builds/884
> {code}
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderOperationsDUnitTest
>  > testParallelPropagationSenderStop FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.WANTestBase$$Lambda$61/93791847.run in VM 
> 6 running on Host 4b34e1ec0e1c with 8 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at 
> org.apache.geode.internal.cache.wan.WANTestBase.createCacheInVMs(WANTestBase.java:870)
> at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderOperationsDUnitTest.createSendersAndReceivers(ParallelGatewaySenderOperationsDUnitTest.java:806)
> at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderOperationsDUnitTest.createSendersReceiversAndPartitionedRegion(ParallelGatewaySenderOperationsDUnitTest.java:793)
> at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderOperationsDUnitTest.testParallelPropagationSenderStop(ParallelGatewaySenderOperationsDUnitTest.java:300)
> Caused by:
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> Distributed System is shutting down, caused by 
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> org.apache.geode.ForcedDisconnectException: Member isn't responding to 
> heartbeat requests
> Caused by:
> 
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> org.apache.geode.ForcedDisconnectException: Member isn't responding to 
> heartbeat requests
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-7090) Remove dependency on DataSerializer from membership classes

2019-08-16 Thread Bruce Schuchardt (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909147#comment-16909147
 ] 

Bruce Schuchardt commented on GEODE-7090:
-

InternalDataSerializer is a subclass of DataSerializer, so continued use of it 
in GMS classes is a non-starter.  Another solution needs to be devised to 
remove use of DataSerializer.

> Remove dependency on DataSerializer from membership classes
> ---
>
> Key: GEODE-7090
> URL: https://issues.apache.org/jira/browse/GEODE-7090
> Project: Geode
>  Issue Type: Sub-task
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GEODE-7095) addsMetersForFileDescriptorMetricsBinder test fails: meters is empty

2019-08-16 Thread Aaron Lindsey (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Lindsey resolved GEODE-7095.
--
   Resolution: Fixed
Fix Version/s: 1.11.0

> addsMetersForFileDescriptorMetricsBinder test fails: meters is empty
> 
>
> Key: GEODE-7095
> URL: https://issues.apache.org/jira/browse/GEODE-7095
> Project: Geode
>  Issue Type: Bug
>Reporter: Bill Burcham
>Assignee: Aaron Lindsey
>Priority: Major
> Fix For: 1.11.0
>
>
> In this CI build: 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsUnitTestOpenJDK8/builds/766
> The brand new test fails:
> {code}
> org.apache.geode.internal.metrics.CacheMeterRegistryFactoryTest > 
> addsMetersForFileDescriptorMetricsBinder FAILED
> java.lang.AssertionError: 
> Expecting actual not to be empty
> at 
> org.apache.geode.internal.metrics.CacheMeterRegistryFactoryTest.addsMetersForFileDescriptorMetricsBinder(CacheMeterRegistryFactoryTest.java:181)
> {code}
> 10 runs on laptop did not reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (GEODE-7079) NPE Upon Restart When Using Asynchronous Event Distribution & Conflation

2019-08-16 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/GEODE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan José Ramos Cassella resolved GEODE-7079.
-
   Resolution: Fixed
Fix Version/s: 1.11.0

> NPE Upon Restart When Using Asynchronous Event Distribution & Conflation
> 
>
> Key: GEODE-7079
> URL: https://issues.apache.org/jira/browse/GEODE-7079
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Juan José Ramos Cassella
>Assignee: Juan José Ramos Cassella
>Priority: Major
>  Labels: GeodeCommons
> Fix For: 1.11.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The following combination of features cannot be safely configured when using 
> pure Geode Java API:
> * Replicated Region.
> * Serial Gateway Sender or Serial Asynchronous Event Queue.
> * Serial Gateway Sender / Serial Asynchronous Event Queue is Persistent.
> * Conflation is Enabled for the Serial Gateway Sender / Serial Asynchronous 
> Event Queue.
> The problem is that, after a restart, events left-over in the persistent 
> queue begin processing before their source {{Region}} is instantiated, 
> causing a {{NullPointerExceptions}} while executing the conflation logic.
> The {{Region}} is only required because internally we need its name, but the 
> name itself is already stored within the actual event so it should be safe to 
> replace {{gsEvent.getRegion().getFullPath()}} by 
> {{gsEvent.getRegionToConflate()}} or {{gsEvent.getRegionPath()}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (GEODE-7039) Server recovery severely degrades client read traffic (no SingleHop no TX) on redundant partitioned persistent regions

2019-08-16 Thread Mario Ivanac (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mario Ivanac updated GEODE-7039:

Summary: Server recovery severely degrades client read traffic (no 
SingleHop no TX) on redundant partitioned persistent regions  (was: Server 
recovery severely degrades client read traffic (no SingleHop no TX) on 
redundant partitioned regions)

> Server recovery severely degrades client read traffic (no SingleHop no TX) on 
> redundant partitioned persistent regions
> --
>
> Key: GEODE-7039
> URL: https://issues.apache.org/jira/browse/GEODE-7039
> Project: Geode
>  Issue Type: Improvement
>  Components: client/server
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>
> Client not using single hop nor transactions is experiencing severe 
> throttling from the cluster when getting data from a partitioned region while 
> server hosting one of the redundant buckets is recovering (in the process of 
> image recovery). Get operation that have not landed on a server hosting the 
> bucket will be proxied to other members that do have the bucket in a random 
> fashion. This random picking has the nasty consequence that chosen server 
> might be the one recovering now and the bucket is not yet ready 
> (BucketNotFoundException), which means local server will handle 
> ForceReattemptException by sleeping 100ms before another (random) attempt. 
> This sleeping is devasteting for throughput observed by the client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (GEODE-7039) Server recovery severely degrades client read traffic (no SingleHop no TX) on redundant partitioned persistent regions

2019-08-16 Thread Mario Ivanac (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mario Ivanac updated GEODE-7039:

Description: Client not using single hop nor transactions is experiencing 
severe throttling from the cluster when getting data from a partitioned 
persistent region while server hosting one of the redundant buckets is 
recovering (in the process of image recovery). Get operation that have not 
landed on a server hosting the bucket will be proxied to other members that do 
have the bucket in a random fashion. This random picking has the nasty 
consequence that chosen server might be the one recovering now and the bucket 
is not yet ready (BucketNotFoundException), which means local server will 
handle ForceReattemptException by sleeping 100ms before another (random) 
attempt. This sleeping is devasteting for throughput observed by the client.  
(was: Client not using single hop nor transactions is experiencing severe 
throttling from the cluster when getting data from a partitioned region while 
server hosting one of the redundant buckets is recovering (in the process of 
image recovery). Get operation that have not landed on a server hosting the 
bucket will be proxied to other members that do have the bucket in a random 
fashion. This random picking has the nasty consequence that chosen server might 
be the one recovering now and the bucket is not yet ready 
(BucketNotFoundException), which means local server will handle 
ForceReattemptException by sleeping 100ms before another (random) attempt. This 
sleeping is devasteting for throughput observed by the client.)

> Server recovery severely degrades client read traffic (no SingleHop no TX) on 
> redundant partitioned persistent regions
> --
>
> Key: GEODE-7039
> URL: https://issues.apache.org/jira/browse/GEODE-7039
> Project: Geode
>  Issue Type: Improvement
>  Components: client/server
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>
> Client not using single hop nor transactions is experiencing severe 
> throttling from the cluster when getting data from a partitioned persistent 
> region while server hosting one of the redundant buckets is recovering (in 
> the process of image recovery). Get operation that have not landed on a 
> server hosting the bucket will be proxied to other members that do have the 
> bucket in a random fashion. This random picking has the nasty consequence 
> that chosen server might be the one recovering now and the bucket is not yet 
> ready (BucketNotFoundException), which means local server will handle 
> ForceReattemptException by sleeping 100ms before another (random) attempt. 
> This sleeping is devasteting for throughput observed by the client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)