[jira] [Resolved] (GEODE-8346) NonTXEntry.getValue() may throw EntryDestroyedException during CQ execution

2020-07-09 Thread Donal Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans resolved GEODE-8346.

Fix Version/s: 1.14.0
   Resolution: Fixed

> NonTXEntry.getValue() may throw EntryDestroyedException during CQ execution
> ---
>
> Key: GEODE-8346
> URL: https://issues.apache.org/jira/browse/GEODE-8346
> Project: Geode
>  Issue Type: Bug
>  Components: cq
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Assignee: Donal Evans
>Priority: Major
> Fix For: 1.14.0
>
>
> If a region entry is destroyed at the same time that a CQ is executed, there 
> exists a race condition where a non-destroyed {{NonTXEntry}} is retrieved 
> during iteration of results in {{CompiledSelect.doNestedIterations()}} but is 
> marked as destroyed/removed before {{NonTXEntry.getValue()}} is called in 
> {{CompiledComparison.evaluate()}}, which results in an 
> {{EntryDestroyedException}} being thrown.
> {noformat}
> org.apache.geode.cache.query.CqException: Failed to execute the CQ. CqName: 
> testCQ, Query String is: SELECT * FROM /testRegion entry WHERE entry = NULL, 
> Error from last server: remote server on 
> 10.212.3.32(84004:loner):49205:d737a530: While performing a remote 
> createCQfetchInitialResult
>   at 
> org.apache.geode.cache.query.cq.internal.ClientCQImpl.executeCqOnRedundantsAndPrimary(ClientCQImpl.java:435)
>   at 
> org.apache.geode.cache.query.cq.internal.ClientCQImpl.executeWithInitialResults(ClientCQImpl.java:303)
>   at 
> org.apache.geode.cache.query.cq.DonalCQTest.lambda$test$bb17a952$2(DonalCQTest.java:84)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>   at 
> org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>   at sun.rmi.transport.Transport$1.run(Transport.java:200)
>   at sun.rmi.transport.Transport$1.run(Transport.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.geode.cache.client.ServerOperationException: remote 
> server on 10.212.3.32(84004:loner):49205:d737a530: While performing a remote 
> createCQfetchInitialResult
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processChunkedResponse(AbstractOp.java:340)
>   at 
> org.apache.geode.cache.client.internal.QueryOp$QueryOpImpl.processResponse(QueryOp.java:168)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.processResponse(AbstractOp.java:222)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:195)
>   at 
> org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:382)
>   at 
> org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:283)
>   at 
> org.apache.geode.cache.client.internal.QueueConnectionImpl.execute(QueueConnectionImpl.java:191)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:753)
>   at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:454)
>   at 
> 

[jira] [Commented] (GEODE-7678) Partitioned Region clear operations must invoke cache level listeners

2020-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154931#comment-17154931
 ] 

ASF subversion and git services commented on GEODE-7678:


Commit 8eeaac4b53a4a4dd3e91722855de6bcc6da04883 in geode's branch 
refs/heads/feature/GEODE-8334 from agingade
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8eeaac4 ]

GEODE-7678 (2nd PR) - Support for cache-listener and client-notification for 
Partitioned Region Clear operation  (#5124)

* GEODE-7678: Add support for cache listener and client notification for PR 
clear

The changes are made to PR clear messaging and locking mechanism to preserve
cache-listener and client-events ordering during concurrent cache operation
while clear in progress.


> Partitioned Region clear operations must invoke cache level listeners
> -
>
> Key: GEODE-7678
> URL: https://issues.apache.org/jira/browse/GEODE-7678
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Reporter: Nabarun Nag
>Assignee: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeCommons, GeodeOperationAPI
>
> Clear operations are successful and CacheListener.afterRegionClear(), 
> CacheWriter.beforeRegionClear() are invoked.
>  
> Acceptance :
>  * DUnit tests validating the above behavior.
>  * Test coverage to when a member departs in this scenario
>  * Test coverage to when a member restarts in this scenario
>  * Unit tests with complete code coverage for the newly written code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8349) reinstate use of SSLSocket for cluster communication

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154955#comment-17154955
 ] 

ASF GitHub Bot commented on GEODE-8349:
---

lgtm-com[bot] commented on pull request #5363:
URL: https://github.com/apache/geode/pull/5363#issuecomment-656369983


   This pull request **introduces 1 alert** and **fixes 1** when merging 
7f84de83311e0df3061b1ae4e3f2d9160b8df364 into 
62ee81fa428a80470083d1a304d7704b15658d2c - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-c38d1d44d923ee99f0c2d149525d016d1ef02129)
   
   **new alerts:**
   
   * 1 for Potential input resource leak
   
   **fixed alerts:**
   
   * 1 for Potential input resource leak



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> reinstate use of SSLSocket for cluster communication
> 
>
> Key: GEODE-8349
> URL: https://issues.apache.org/jira/browse/GEODE-8349
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>
> We've found problems with "new IO"'s SSLEngine with respect to support for 
> TLSV1.  We've also seen anomalous performance using that secure 
> communications mechanism.  The introduction of the use of the "new IO" 
> SSLEngine was originally to 1) reduce code complexity in the 
> org.apache.geode.internal.tcp package and 2) to set the stage for its use in 
> client/server communications so that selectors could be used in c/s 
> communications.
> This ticket aims to reintroduce the use of SSLSocket in cluster 
> communications without restoring the old, poorly tested SSL code paths.  The 
> new implementation should have as good or better performance than the 
> previous"old IO" implementation and the more recent "new IO" SSLEngine 
> implementation as well.  This should be apparent in the CI benchmark jobs.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154968#comment-17154968
 ] 

ASF GitHub Bot commented on GEODE-8351:
---

dschneider-pivotal commented on a change in pull request #5364:
URL: https://github.com/apache/geode/pull/5364#discussion_r452526273



##
File path: 
geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+package org.apache.geode.redis.internal.data;
+
+import static 
org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT;
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import redis.clients.jedis.Jedis;
+
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.partition.PartitionRegionHelper;
+import org.apache.geode.internal.cache.InternalCache;
+import org.apache.geode.test.awaitility.GeodeAwaitility;
+import org.apache.geode.test.dunit.rules.ClusterStartupRule;
+import org.apache.geode.test.dunit.rules.MemberVM;
+import org.apache.geode.test.dunit.rules.RedisClusterStartupRule;
+
+public class DeltaDUnitTest {
+
+  @ClassRule
+  public static RedisClusterStartupRule clusterStartUp = new 
RedisClusterStartupRule(4);
+
+  private static final String LOCAL_HOST = "127.0.0.1";
+  private static final int SET_SIZE = 10;
+  private static final int JEDIS_TIMEOUT =
+  Math.toIntExact(GeodeAwaitility.getTimeout().toMillis());
+  private static Jedis jedis1;
+  private static Jedis jedis2;
+
+  private static Properties locatorProperties;
+
+  private static MemberVM locator;
+  private static MemberVM server1;
+  private static MemberVM server2;
+
+  private static int redisServerPort1;
+  private static int redisServerPort2;
+
+  @BeforeClass
+  public static void classSetup() {
+locatorProperties = new Properties();
+locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000");
+
+locator = clusterStartUp.startLocatorVM(0, locatorProperties);
+server1 = clusterStartUp.startRedisVM(1, locator.getPort());
+server2 = clusterStartUp.startRedisVM(2, locator.getPort());
+
+redisServerPort1 = clusterStartUp.getRedisPort(1);
+redisServerPort2 = clusterStartUp.getRedisPort(2);
+
+jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT);
+jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT);
+  }
+
+  @Before
+  public void testSetup() {
+jedis1.flushAll();
+  }
+
+  @AfterClass
+  public static void tearDown() {
+jedis1.disconnect();
+jedis2.disconnect();
+
+server1.stop();
+server2.stop();
+  }
+
+  @Test
+  public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() {
+String key = "key";
+String baseValue = "value-";
+jedis1.set(key, baseValue);
+for (int i = 0; i < SET_SIZE; i++) {
+  jedis1.set(key, String.valueOf(i));
+
+  String server1LocalValue = server1.invoke(() -> {

Review comment:
   It seems like you could write it in a way that it is the same for each 
test. You really don't care in this test what is in each server; just that 
whatever one server has the other does also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DUnit tests for Delta Propagation
> -
>
> Key: GEODE-8351
> URL: https://issues.apache.org/jira/browse/GEODE-8351
> Project: Geode
>  Issue Type: Test
>  Components: redis, tests
>Reporter: Sarah Abbey
>  

[jira] [Commented] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154970#comment-17154970
 ] 

ASF subversion and git services commented on GEODE-8334:


Commit 4e6705335950d30fcc1e5831ca2d09612e47b78f in geode's branch 
refs/heads/feature/GEODE-8334 from zhouxh
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=4e67053 ]

GEODE-8334: PR.clear should sync with putAll or removeAll on rvvLock


> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8200) Rebalance operations stuck in "IN_PROGRESS" state forever

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154971#comment-17154971
 ] 

ASF GitHub Bot commented on GEODE-8200:
---

agingade commented on a change in pull request #5350:
URL: https://github.com/apache/geode/pull/5350#discussion_r452518577



##
File path: 
geode-core/src/main/java/org/apache/geode/management/internal/operation/OperationHistoryManager.java
##
@@ -90,6 +95,27 @@ private static boolean isExpired(long expirationTime, 
OperationState opera
 return operationEnd.getTime() <= expirationTime;
   }
 
+  private OperationState validateLocator(OperationState 
operationState) {
+if (isLocatorOffline(operationState)) {
+  operationState.setOperationEnd(new Date(), null,
+  new RuntimeException("Locator that initiated the Rest API operation 
is offline: "
+  + operationState.getLocator()));
+}
+
+return operationState;
+  }
+
+  private boolean isLocatorOffline(OperationState operationState) {
+if (operationState.getOperationEnd() == null
+&& (operationState.getLocator() != null)
+&& cache.getMyId().toString().compareTo(operationState.getLocator()) 
!= 0

Review comment:
   Does it need to be compared? can it be changed to "equals"

##
File path: 
geode-core/src/main/java/org/apache/geode/management/internal/operation/OperationState.java
##
@@ -28,12 +28,25 @@
  */
 public class OperationState, V extends 
OperationResult>
 implements Identifiable {
+  private static final long serialVersionUID = 8212319653561969588L;
   private final String opId;
   private final A operation;
   private final Date operationStart;
   private Date operationEnd;
   private V result;
   private Throwable throwable;
+  private String locator;

Review comment:
   Can this be DistributedID than a String ID. That way we can avoid 
converting to string in other places?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rebalance operations stuck in "IN_PROGRESS" state forever
> -
>
> Key: GEODE-8200
> URL: https://issues.apache.org/jira/browse/GEODE-8200
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Aaron Lindsey
>Assignee: Jianxia Chen
>Priority: Major
>  Labels: GeodeOperationAPI
> Attachments: GEODE-8200-exportedLogs.zip
>
>
> We use the management REST API to call rebalance immediately before stopping 
> a server to limit the possibility of data loss. In a cluster with 3 locators, 
> 3 servers, and no regions, we noticed that sometimes the rebalance operation 
> never ends if one of the locators is restarting concurrently with the 
> rebalance operation.
> More specifically, the scenario where we see this issue crop up is during an 
> automated "rolling restart" operation in a Kubernetes environment which 
> proceeds as follows:
> * At most one locator and one server are restarting at any point in time
> * Each locator/server waits until the previous locator/server is fully online 
> before restarting
> * Immediately before stopping a server, a rebalance operation is performed 
> and the server is not stopped until the rebalance operation is completed
> The impact of this issue is that the "rolling restart" operation will never 
> complete, because it cannot proceed with stopping a server until the 
> rebalance operation is completed. A human is then required to intervene and 
> manually trigger a rebalance and stop the server. This type of "rolling 
> restart" operation is triggered fairly often in Kubernetes — any time part of 
> the configuration of the locators or servers changes. 
> The following JSON is a sample response from the management REST API that 
> shows the rebalance operation stuck in "IN_PROGRESS".
> {code}
> {
>   "statusCode": "IN_PROGRESS",
>   "links": {
> "self": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/a47f23c8-02b3-443c-a367-636fd6921ea7;,
> "list": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances;
>   },
>   "operationStart": "2020-05-27T22:38:30.619Z",
>   "operationId": "a47f23c8-02b3-443c-a367-636fd6921ea7",
>   "operation": {
> "simulate": false
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154984#comment-17154984
 ] 

ASF GitHub Bot commented on GEODE-8334:
---

DonalEvans commented on a change in pull request #5365:
URL: https://github.com/apache/geode/pull/5365#discussion_r452538880



##
File path: 
geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/RemoveAllPRMessageTest.java
##
@@ -131,4 +134,35 @@ public void 
removeAndNotifyKeysIsNotInvokedIfKeysNotLocked() throws Exception {
 verify(dataStore).checkRegionDestroyedOnBucket(eq(bucketRegion), eq(true),
 eq(regionDestroyedException));
   }
+
+  @Test
+  public void rvvLockedAfterKeysAreLockedAndUnlockRVVBeforeKeys() throws 
Exception {
+RemoveAllPRMessage message =
+spy(new RemoveAllPRMessage(bucketId, 1, false, false, false, null));
+message.addEntry(entryData);
+doReturn(keys).when(message).getKeysToBeLocked();
+when(bucketRegion.waitUntilLocked(keys)).thenReturn(true);
+when(bucketRegion.doLockForPrimary(false)).thenThrow(new 
PrimaryBucketException());
+doNothing().when(bucketRegion).lockRVVForBulkOp();
+doNothing().when(bucketRegion).unlockRVVForBulkOp();
+
+InternalCache cache = mock(InternalCache.class);
+InternalDistributedSystem ids = mock(InternalDistributedSystem.class);
+when(bucketRegion.getCache()).thenReturn(cache);
+when(cache.getDistributedSystem()).thenReturn(ids);
+when(ids.getOffHeapStore()).thenReturn(null);
+
+try {
+  message.doLocalRemoveAll(partitionedRegion, 
mock(InternalDistributedMember.class), true);
+  fail("Expect PrimaryBucketException");
+} catch (Exception e) {
+  assertThat(e instanceof PrimaryBucketException);
+}

Review comment:
   This can be replaced with `assertThatThrownBy(() -> 
message.doLocalRemoveAll(partitionedRegion, 
mock(InternalDistributedMember.class), 
true)).isInstanceOf(PrimaryBucketException.class);` to make things a bit neater.

##
File path: 
geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/PutAllPRMessageTest.java
##
@@ -119,4 +122,34 @@ public void 
removeAndNotifyKeysIsNotInvokedIfKeysNotLocked() throws Exception {
 eq(regionDestroyedException));
   }
 
+  @Test
+  public void rvvLockedAfterKeysAreLockedAndUnlockRVVBeforeKeys() throws 
Exception {
+PutAllPRMessage message = spy(new PutAllPRMessage(bucketId, 1, false, 
false, false, null));
+message.addEntry(entryData);
+doReturn(keys).when(message).getKeysToBeLocked();
+when(bucketRegion.waitUntilLocked(keys)).thenReturn(true);
+when(bucketRegion.doLockForPrimary(false)).thenThrow(new 
PrimaryBucketException());
+doNothing().when(bucketRegion).lockRVVForBulkOp();
+doNothing().when(bucketRegion).unlockRVVForBulkOp();
+
+InternalCache cache = mock(InternalCache.class);
+InternalDistributedSystem ids = mock(InternalDistributedSystem.class);
+when(bucketRegion.getCache()).thenReturn(cache);
+when(cache.getDistributedSystem()).thenReturn(ids);
+when(ids.getOffHeapStore()).thenReturn(null);
+
+try {
+  message.doLocalPutAll(partitionedRegion, 
mock(InternalDistributedMember.class), 1);
+  fail("Expect PrimaryBucketException");
+} catch (Exception e) {
+  assertThat(e instanceof PrimaryBucketException);
+}

Review comment:
   This can be replaced with `assertThatThrownBy(() -> 
message.doLocalPutAll(partitionedRegion, mock(InternalDistributedMember.class), 
1)).isInstanceOf(PrimaryBucketException.class);` to make things a bit neater.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154948#comment-17154948
 ] 

ASF GitHub Bot commented on GEODE-8351:
---

sabbeyPivotal opened a new pull request #5364:
URL: https://github.com/apache/geode/pull/5364


   Need to confirm that when deltas are propagated, the data is correctly 
stored on the secondary



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DUnit tests for Delta Propagation
> -
>
> Key: GEODE-8351
> URL: https://issues.apache.org/jira/browse/GEODE-8351
> Project: Geode
>  Issue Type: Test
>  Components: redis, tests
>Reporter: Sarah Abbey
>Priority: Major
>
> Need to confirm that when deltas are propagated, the data is correctly stored 
> on the secondary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8352) The rest api command does expireHistory for every operation

2020-07-09 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-8352:
-
Labels: GeodeOperationAPI  (was: )

> The rest api command does expireHistory for every operation
> ---
>
> Key: GEODE-8352
> URL: https://issues.apache.org/jira/browse/GEODE-8352
> Project: Geode
>  Issue Type: Bug
>  Components: rest (admin)
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> The rest API command does expireHistory for every rest command executed. If 
> the Meta data region grows, it could take a long time. And executing a 
> command and expiring pending tasks should be two separate operations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8352) The rest api command does expireHistory for every operation

2020-07-09 Thread Anilkumar Gingade (Jira)
Anilkumar Gingade created GEODE-8352:


 Summary: The rest api command does expireHistory for every 
operation
 Key: GEODE-8352
 URL: https://issues.apache.org/jira/browse/GEODE-8352
 Project: Geode
  Issue Type: Bug
  Components: rest (admin)
Reporter: Anilkumar Gingade


The rest API command does expireHistory for every rest command executed. If the 
Meta data region grows, it could take a long time. And executing a command and 
expiring pending tasks should be two separate operations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8352) The rest api command does expireHistory for every operation

2020-07-09 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-8352:
-
Affects Version/s: 1.13.0

> The rest api command does expireHistory for every operation
> ---
>
> Key: GEODE-8352
> URL: https://issues.apache.org/jira/browse/GEODE-8352
> Project: Geode
>  Issue Type: Bug
>  Components: rest (admin)
>Affects Versions: 1.13.0
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> The rest API command does expireHistory for every rest command executed. If 
> the Meta data region grows, it could take a long time. And executing a 
> command and expiring pending tasks should be two separate operations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8200) Rebalance operations stuck in "IN_PROGRESS" state forever

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154974#comment-17154974
 ] 

ASF GitHub Bot commented on GEODE-8200:
---

jchen21 commented on a change in pull request #5350:
URL: https://github.com/apache/geode/pull/5350#discussion_r452534574



##
File path: 
geode-core/src/main/java/org/apache/geode/management/internal/operation/OperationHistoryManager.java
##
@@ -90,6 +95,27 @@ private static boolean isExpired(long expirationTime, 
OperationState opera
 return operationEnd.getTime() <= expirationTime;
   }
 
+  private OperationState validateLocator(OperationState 
operationState) {
+if (isLocatorOffline(operationState)) {
+  operationState.setOperationEnd(new Date(), null,
+  new RuntimeException("Locator that initiated the Rest API operation 
is offline: "
+  + operationState.getLocator()));
+}
+
+return operationState;
+  }
+
+  private boolean isLocatorOffline(OperationState operationState) {
+if (operationState.getOperationEnd() == null
+&& (operationState.getLocator() != null)
+&& cache.getMyId().toString().compareTo(operationState.getLocator()) 
!= 0

Review comment:
   Good point!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rebalance operations stuck in "IN_PROGRESS" state forever
> -
>
> Key: GEODE-8200
> URL: https://issues.apache.org/jira/browse/GEODE-8200
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Aaron Lindsey
>Assignee: Jianxia Chen
>Priority: Major
>  Labels: GeodeOperationAPI
> Attachments: GEODE-8200-exportedLogs.zip
>
>
> We use the management REST API to call rebalance immediately before stopping 
> a server to limit the possibility of data loss. In a cluster with 3 locators, 
> 3 servers, and no regions, we noticed that sometimes the rebalance operation 
> never ends if one of the locators is restarting concurrently with the 
> rebalance operation.
> More specifically, the scenario where we see this issue crop up is during an 
> automated "rolling restart" operation in a Kubernetes environment which 
> proceeds as follows:
> * At most one locator and one server are restarting at any point in time
> * Each locator/server waits until the previous locator/server is fully online 
> before restarting
> * Immediately before stopping a server, a rebalance operation is performed 
> and the server is not stopped until the rebalance operation is completed
> The impact of this issue is that the "rolling restart" operation will never 
> complete, because it cannot proceed with stopping a server until the 
> rebalance operation is completed. A human is then required to intervene and 
> manually trigger a rebalance and stop the server. This type of "rolling 
> restart" operation is triggered fairly often in Kubernetes — any time part of 
> the configuration of the locators or servers changes. 
> The following JSON is a sample response from the management REST API that 
> shows the rebalance operation stuck in "IN_PROGRESS".
> {code}
> {
>   "statusCode": "IN_PROGRESS",
>   "links": {
> "self": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/a47f23c8-02b3-443c-a367-636fd6921ea7;,
> "list": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances;
>   },
>   "operationStart": "2020-05-27T22:38:30.619Z",
>   "operationId": "a47f23c8-02b3-443c-a367-636fd6921ea7",
>   "operation": {
> "simulate": false
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154973#comment-17154973
 ] 

ASF GitHub Bot commented on GEODE-8334:
---

gesterzhou opened a new pull request #5365:
URL: https://github.com/apache/geode/pull/5365


   Co-authored-by: Xiaojian Zhou 
   Co-authored-by: Anil 
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154983#comment-17154983
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as false.
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154985#comment-17154985
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154990#comment-17154990
 ] 

ASF GitHub Bot commented on GEODE-8326:
---

agingade commented on a change in pull request #5358:
URL: https://github.com/apache/geode/pull/5358#discussion_r452541982



##
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/partitioned/fixed/FixedPartitioningWithTransactionDistributedTest.java
##
@@ -238,7 +238,7 @@ private void forceClientMetadataUpdate(Region region) {
 ClientMetadataService clientMetadataService =
 ((InternalCache) 
clientCacheRule.getClientCache()).getClientMetadataService();
 clientMetadataService.scheduleGetPRMetaData((InternalRegion) region, true);
-await().atMost(5, MINUTES).until(clientMetadataService::isMetadataStable);
+await().atMost(5, HOURS).until(clientMetadataService::isMetadataStable);

Review comment:
   5 hours is a long time...Will this change going to be merged to develop? 
Can the debug be tried with local/private build?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CI Failure: 
> FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled
>  times out waiting for client metadata
> ---
>
> Key: GEODE-8326
> URL: https://issues.apache.org/jira/browse/GEODE-8326
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Affects Versions: 1.13.0
>Reporter: Kirk Lund
>Assignee: Eric Shu
>Priority: Major
>  Labels: caching-applications
>
> CI Failure: 
> http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/
> {noformat}
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  > 
> clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject]
>  FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  that uses org.apache.geode.cache.client.internal.ClientMetadataService was 
> not fulfilled within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254)
> {noformat}
> The failure occurs after waiting 5 minutes for the ClientMetadataService to 
> stabilize. See ClientMetadataService#isMetadataStable.
> The timeout occurs within a block of test code that was introduced by Jake in 
> PR #3840:
> {noformat}
> GEODE-7006: Fixes function execution by id with transactions. (#3840)  
> * Fixes test to force and wait for PR metadata to update.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8337) Rename Version enum to KnownVersion; VersionOrdinal to Version

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154208#comment-17154208
 ] 

ASF GitHub Bot commented on GEODE-8337:
---

albertogpz commented on pull request #5355:
URL: https://github.com/apache/geode/pull/5355#issuecomment-655921588


   > @albertogpz for some reason I can't add you as a reviewer, but thought you 
might be interested in this, the "last" PR associated with GEODE-8240. This one 
finalizing the renaming of the types in the new versioning hierarchy (and the 
associated `Versioning` factory) etc.
   
   @Bill it might be because I am not a committer. Thanks for taking me into 
account because I am definitely interested.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Rename Version enum to KnownVersion; VersionOrdinal to Version
> --
>
> Key: GEODE-8337
> URL: https://issues.apache.org/jira/browse/GEODE-8337
> Project: Geode
>  Issue Type: Improvement
>  Components: serialization
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> As a follow-on to GEODE-8240 and GEODE-8330, this is the final ticket, to 
> rename:
> {{Version}} -> {{KnownVersion}}
> {{VersionOrdinal}} -> {{Version}}
> With this ticket, the work started in GEODE-8240 is complete.
> After this change, the versioning hierarchy will be:
>  !screenshot-1.png! 
> Before this change, the hierarchy was:
>  !screenshot-2.png! 
> As part of this story we'll also harmonize version access methods on 
> MemberIdentifier, InternalDistributedMember, and GMSMemberData:
> getVersionOrdinalObject() becomes getVersion()
> On GMSMemberData:
> setVersionObjectForTest() becomes setVersionForTest()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8349) reinstate use of SSLSocket for cluster communication

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154995#comment-17154995
 ] 

ASF GitHub Bot commented on GEODE-8349:
---

bschuchardt commented on pull request #5363:
URL: https://github.com/apache/geode/pull/5363#issuecomment-656401747


   > org.apache.geode.benchmark.tests.PartitionedGetBenchmark
   >  average ops/second  Baseline:331221.55  Test:
342201.08  Difference:   +3.3%
   >   ops/second standard error  Baseline:   521.34  Test:   
555.60  Difference:   +6.6%
   >   ops/second standard deviation  Baseline:  9014.73  Test:  
9607.16  Difference:   +6.6%
   >  YS 99th percentile latency  Baseline: 20071.00  Test: 
20071.00  Difference:   +0.0%
   >  median latency  Baseline:362239.00  Test:
361471.00  Difference:   -0.2%
   > 90th percentile latency  Baseline:   7036927.00  Test:   
6995967.00  Difference:   -0.6%
   > 99th percentile latency  Baseline:  24428543.00  Test:  
23085055.00  Difference:   -5.5%
   >   99.9th percentile latency  Baseline:  34963455.00  Test:  
33259519.00  Difference:   -4.9%
   > average latency  Baseline:   2171050.56  Test:   
2101421.48  Difference:   -3.2%
   >  latency standard deviation  Baseline:   5334288.43  Test:   
5076754.60  Difference:   -4.8%
   >  latency standard error  Baseline:   535.20  Test:   
501.11  Difference:   -6.4%
   
   
   >org.apache.geode.benchmark.tests.PartitionedPutBenchmark
   >  average ops/second  Baseline:178773.53  Test:
182095.99  Difference:   +1.9%
   >   ops/second standard error  Baseline:   599.58  Test:   
835.58  Difference:  +39.4%
   >   ops/second standard deviation  Baseline: 10367.72  Test: 
14448.49  Difference:  +39.4%
   >  YS 99th percentile latency  Baseline: 20083.00  Test: 
20082.80  Difference:   -0.0%
   >  median latency  Baseline:765951.00  Test:
767999.00  Difference:   +0.3%
   > 90th percentile latency  Baseline:  13459455.00  Test:  
13107199.00  Difference:   -2.6%
   > 99th percentile latency  Baseline:  28983295.00  Test:  
27787263.00  Difference:   -4.1%
   >   99.9th percentile latency  Baseline: 107151359.00  Test: 
109379583.00  Difference:   +2.1%
   > average latency  Baseline:   4024484.55  Test:   
3951012.79  Difference:   -1.8%
   >  latency standard deviation  Baseline:   9229899.71  Test:   
8986494.06  Difference:   -2.6%
   >  latency standard error  Baseline:  1260.59  Test:  
1216.02  Difference:   -3.5%
   
   
   >org.apache.geode.benchmark.tests.ReplicatedGetBenchmark
   >  average ops/second  Baseline:339091.57  Test:
351035.30  Difference:   +3.5%
   >   ops/second standard error  Baseline:   450.59  Test:   
371.17  Difference:  -17.6%
   >   ops/second standard deviation  Baseline:  7791.42  Test:  
6418.07  Difference:  -17.6%
   >  YS 99th percentile latency  Baseline: 20071.00  Test: 
20071.33  Difference:   +0.0%
   >  median latency  Baseline:355839.00  Test:
357375.00  Difference:   +0.4%
   > 90th percentile latency  Baseline:   6524927.00  Test:   
5992447.00  Difference:   -8.2%
   > 99th percentile latency  Baseline:  24526847.00  Test:  
23691263.00  Difference:   -3.4%
   >   99.9th percentile latency  Baseline:  34013183.00  Test:  
32948223.00  Difference:   -3.1%
   > average latency  Baseline:   2120692.40  Test:   
2048843.90  Difference:   -3.4%
   >  latency standard deviation  Baseline:   5263600.14  Test:   
5054339.01  Difference:   -4.0%
   >  latency standard error  Baseline:   521.94  Test:   
492.61  Difference:   -5.6%
   
   
   >org.apache.geode.benchmark.tests.ReplicatedPutBenchmark
   >  average ops/second  Baseline:192443.22  Test:
194403.34  Difference:   +1.0%
   >   ops/second standard error  Baseline:   726.17  Test:   
788.60  Difference:   +8.6%
   >   ops/second standard deviation  Baseline: 12556.67  Test: 
13636.14  Difference:   +8.6%
   >  YS 99th percentile latency  Baseline: 20060.00  Test: 
20072.00  Difference:   +0.1%
   >  median latency  Baseline:809471.00  Test:
781823.00  Difference:   -3.4%
   > 90th percentile latency  Baseline:  10690559.00  Test:  
11599871.00  Difference:   +8.5%
   > 99th percentile latency  Baseline:  23724031.00  Test:  
24379391.00  Difference:   +2.8%
   >   99.9th percentile latency  Baseline: 106692607.00  Test: 
110166015.00  Difference:   +3.3%
   > average latency  Baseline:   3739364.07  Test:   
3701351.14  Difference:   -1.0%
   >  latency standard deviation  Baseline:  

[jira] [Commented] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154992#comment-17154992
 ] 

ASF GitHub Bot commented on GEODE-8334:
---

agingade commented on a change in pull request #5365:
URL: https://github.com/apache/geode/pull/5365#discussion_r452543360



##
File path: 
geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/PutAllPRMessageTest.java
##
@@ -119,4 +122,34 @@ public void 
removeAndNotifyKeysIsNotInvokedIfKeysNotLocked() throws Exception {
 eq(regionDestroyedException));
   }
 
+  @Test
+  public void rvvLockedAfterKeysAreLockedAndUnlockRVVBeforeKeys() throws 
Exception {
+PutAllPRMessage message = spy(new PutAllPRMessage(bucketId, 1, false, 
false, false, null));
+message.addEntry(entryData);
+doReturn(keys).when(message).getKeysToBeLocked();
+when(bucketRegion.waitUntilLocked(keys)).thenReturn(true);
+when(bucketRegion.doLockForPrimary(false)).thenThrow(new 
PrimaryBucketException());
+doNothing().when(bucketRegion).lockRVVForBulkOp();
+doNothing().when(bucketRegion).unlockRVVForBulkOp();
+
+InternalCache cache = mock(InternalCache.class);
+InternalDistributedSystem ids = mock(InternalDistributedSystem.class);
+when(bucketRegion.getCache()).thenReturn(cache);
+when(cache.getDistributedSystem()).thenReturn(ids);
+when(ids.getOffHeapStore()).thenReturn(null);
+
+try {
+  message.doLocalPutAll(partitionedRegion, 
mock(InternalDistributedMember.class), 1);
+  fail("Expect PrimaryBucketException");
+} catch (Exception e) {
+  assertThat(e instanceof PrimaryBucketException);
+}
+
+InOrder inOrder = inOrder(bucketRegion);
+inOrder.verify(bucketRegion).waitUntilLocked(keys);
+inOrder.verify(bucketRegion).lockRVVForBulkOp();
+inOrder.verify(bucketRegion).unlockRVVForBulkOp();

Review comment:
   Having the actual operation "put" in between the lock and unlock makes 
sure the operation is operated under expected locking.

##
File path: 
geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/RemoveAllPRMessageTest.java
##
@@ -131,4 +134,35 @@ public void 
removeAndNotifyKeysIsNotInvokedIfKeysNotLocked() throws Exception {
 verify(dataStore).checkRegionDestroyedOnBucket(eq(bucketRegion), eq(true),
 eq(regionDestroyedException));
   }
+
+  @Test
+  public void rvvLockedAfterKeysAreLockedAndUnlockRVVBeforeKeys() throws 
Exception {
+RemoveAllPRMessage message =
+spy(new RemoveAllPRMessage(bucketId, 1, false, false, false, null));
+message.addEntry(entryData);
+doReturn(keys).when(message).getKeysToBeLocked();
+when(bucketRegion.waitUntilLocked(keys)).thenReturn(true);
+when(bucketRegion.doLockForPrimary(false)).thenThrow(new 
PrimaryBucketException());
+doNothing().when(bucketRegion).lockRVVForBulkOp();
+doNothing().when(bucketRegion).unlockRVVForBulkOp();
+
+InternalCache cache = mock(InternalCache.class);
+InternalDistributedSystem ids = mock(InternalDistributedSystem.class);
+when(bucketRegion.getCache()).thenReturn(cache);
+when(cache.getDistributedSystem()).thenReturn(ids);
+when(ids.getOffHeapStore()).thenReturn(null);
+
+try {
+  message.doLocalRemoveAll(partitionedRegion, 
mock(InternalDistributedMember.class), true);
+  fail("Expect PrimaryBucketException");
+} catch (Exception e) {
+  assertThat(e instanceof PrimaryBucketException);
+}
+
+InOrder inOrder = inOrder(bucketRegion);
+inOrder.verify(bucketRegion).waitUntilLocked(keys);
+inOrder.verify(bucketRegion).lockRVVForBulkOp();
+inOrder.verify(bucketRegion).unlockRVVForBulkOp();

Review comment:
   Having the actual operation "remove" in between the lock and unlock 
makes sure the operation is operated under expected locking.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154997#comment-17154997
 ] 

ASF GitHub Bot commented on GEODE-8334:
---

gesterzhou commented on a change in pull request #5365:
URL: https://github.com/apache/geode/pull/5365#discussion_r452547691



##
File path: 
geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/RemoveAllPRMessageTest.java
##
@@ -131,4 +134,35 @@ public void 
removeAndNotifyKeysIsNotInvokedIfKeysNotLocked() throws Exception {
 verify(dataStore).checkRegionDestroyedOnBucket(eq(bucketRegion), eq(true),
 eq(regionDestroyedException));
   }
+
+  @Test
+  public void rvvLockedAfterKeysAreLockedAndUnlockRVVBeforeKeys() throws 
Exception {
+RemoveAllPRMessage message =
+spy(new RemoveAllPRMessage(bucketId, 1, false, false, false, null));
+message.addEntry(entryData);
+doReturn(keys).when(message).getKeysToBeLocked();
+when(bucketRegion.waitUntilLocked(keys)).thenReturn(true);
+when(bucketRegion.doLockForPrimary(false)).thenThrow(new 
PrimaryBucketException());
+doNothing().when(bucketRegion).lockRVVForBulkOp();
+doNothing().when(bucketRegion).unlockRVVForBulkOp();
+
+InternalCache cache = mock(InternalCache.class);
+InternalDistributedSystem ids = mock(InternalDistributedSystem.class);
+when(bucketRegion.getCache()).thenReturn(cache);
+when(cache.getDistributedSystem()).thenReturn(ids);
+when(ids.getOffHeapStore()).thenReturn(null);
+
+try {
+  message.doLocalRemoveAll(partitionedRegion, 
mock(InternalDistributedMember.class), true);
+  fail("Expect PrimaryBucketException");
+} catch (Exception e) {
+  assertThat(e instanceof PrimaryBucketException);
+}
+
+InOrder inOrder = inOrder(bucketRegion);
+inOrder.verify(bucketRegion).waitUntilLocked(keys);
+inOrder.verify(bucketRegion).lockRVVForBulkOp();
+inOrder.verify(bucketRegion).unlockRVVForBulkOp();

Review comment:
   doLockForPrimary acts as the operation to save a lot of trouble of 
mocking. The test expects this exception to happen.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8347) use benchmarks branch corresponding to geode branch

2020-07-09 Thread Owen Nichols (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols updated GEODE-8347:

Fix Version/s: 1.13.0
   1.12.1

> use benchmarks branch corresponding to geode branch
> ---
>
> Key: GEODE-8347
> URL: https://issues.apache.org/jira/browse/GEODE-8347
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Owen Nichols
>Assignee: Owen Nichols
>Priority: Major
> Fix For: 1.12.1, 1.13.0, 1.14.0
>
>
> Geode 1.12 release included geode-benchmarks from support/1.12, but the 
> pipeline definition is still using benchmarks from develop, as is 1.13.  Fix 
> to use matching branch names between geode and geode-examples.  Also need to 
> rebalance max_in_flight based on how long each benchmark job takes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8067) ClassLoader Isolation

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155028#comment-17155028
 ] 

ASF GitHub Bot commented on GEODE-8067:
---

lgtm-com[bot] commented on pull request #5357:
URL: https://github.com/apache/geode/pull/5357#issuecomment-656430092


   This pull request **introduces 2 alerts** and **fixes 2** when merging 
05510648ef7e1710fa2fd9cf5c0b51b7728b4bb2 into 
daa70d729b98f8edc3791a0dfccbf102ab94dd94 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-6fae7a00aab914b200c0c60eb166f81647a31270)
   
   **new alerts:**
   
   * 2 for Potential input resource leak
   
   **fixed alerts:**
   
   * 2 for Unused variable, import, function or class



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ClassLoader Isolation
> -
>
> Key: GEODE-8067
> URL: https://issues.apache.org/jira/browse/GEODE-8067
> Project: Geode
>  Issue Type: New Feature
>  Components: client/server
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>
> This is the root jira for the first pass implementation for [ClassLoader 
> Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8334) Primary and secondary bucket data mismatch with concurrent putAll/removeAll and PR.clear

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154991#comment-17154991
 ] 

ASF GitHub Bot commented on GEODE-8334:
---

agingade commented on a change in pull request #5365:
URL: https://github.com/apache/geode/pull/5365#discussion_r452542790



##
File path: 
geode-core/src/test/java/org/apache/geode/internal/cache/partitioned/PutAllPRMessageTest.java
##
@@ -119,4 +122,34 @@ public void 
removeAndNotifyKeysIsNotInvokedIfKeysNotLocked() throws Exception {
 eq(regionDestroyedException));
   }
 
+  @Test
+  public void rvvLockedAfterKeysAreLockedAndUnlockRVVBeforeKeys() throws 
Exception {
+PutAllPRMessage message = spy(new PutAllPRMessage(bucketId, 1, false, 
false, false, null));
+message.addEntry(entryData);
+doReturn(keys).when(message).getKeysToBeLocked();
+when(bucketRegion.waitUntilLocked(keys)).thenReturn(true);
+when(bucketRegion.doLockForPrimary(false)).thenThrow(new 
PrimaryBucketException());
+doNothing().when(bucketRegion).lockRVVForBulkOp();
+doNothing().when(bucketRegion).unlockRVVForBulkOp();
+
+InternalCache cache = mock(InternalCache.class);
+InternalDistributedSystem ids = mock(InternalDistributedSystem.class);
+when(bucketRegion.getCache()).thenReturn(cache);
+when(cache.getDistributedSystem()).thenReturn(ids);
+when(ids.getOffHeapStore()).thenReturn(null);
+
+try {
+  message.doLocalPutAll(partitionedRegion, 
mock(InternalDistributedMember.class), 1);
+  fail("Expect PrimaryBucketException");
+} catch (Exception e) {
+  assertThat(e instanceof PrimaryBucketException);
+}

Review comment:
   +1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Primary and secondary bucket data mismatch with concurrent putAll/removeAll 
> and PR.clear 
> -
>
> Key: GEODE-8334
> URL: https://issues.apache.org/jira/browse/GEODE-8334
> Project: Geode
>  Issue Type: Sub-task
>  Components: regions
>Affects Versions: 1.14.0
>Reporter: Xiaojian Zhou
>Assignee: Xiaojian Zhou
>Priority: Major
>  Labels: GeodeOperationAPI
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8067) ClassLoader Isolation

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155001#comment-17155001
 ] 

ASF GitHub Bot commented on GEODE-8067:
---

lgtm-com[bot] commented on pull request #5357:
URL: https://github.com/apache/geode/pull/5357#issuecomment-656405998


   This pull request **introduces 2 alerts** and **fixes 2** when merging 
fa991b29babee1c160fd8e734d56f990750d85fe into 
daa70d729b98f8edc3791a0dfccbf102ab94dd94 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-49691015b85b3016f0cd12c4b48d968180a69392)
   
   **new alerts:**
   
   * 2 for Potential input resource leak
   
   **fixed alerts:**
   
   * 2 for Unused variable, import, function or class



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ClassLoader Isolation
> -
>
> Key: GEODE-8067
> URL: https://issues.apache.org/jira/browse/GEODE-8067
> Project: Geode
>  Issue Type: New Feature
>  Components: client/server
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>
> This is the root jira for the first pass implementation for [ClassLoader 
> Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8347) use benchmarks branch corresponding to geode branch

2020-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155010#comment-17155010
 ] 

ASF subversion and git services commented on GEODE-8347:


Commit c7b82a538cc24c22d0b1fa4380676c2ba9ac5270 in geode's branch 
refs/heads/support/1.12 from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c7b82a5 ]

GEODE-8347: use same benchmarks branch as geode branch, since that's the way we 
release (#5361)

also balance max_in_flight according to how long each job takes

(cherry picked from commit daa70d729b98f8edc3791a0dfccbf102ab94dd94)
(cherry picked from commit ecd697625bb799d82dc41158d51baa73ede97bf8)


> use benchmarks branch corresponding to geode branch
> ---
>
> Key: GEODE-8347
> URL: https://issues.apache.org/jira/browse/GEODE-8347
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Owen Nichols
>Assignee: Owen Nichols
>Priority: Major
> Fix For: 1.14.0
>
>
> Geode 1.12 release included geode-benchmarks from support/1.12, but the 
> pipeline definition is still using benchmarks from develop, as is 1.13.  Fix 
> to use matching branch names between geode and geode-examples.  Also need to 
> rebalance max_in_flight based on how long each benchmark job takes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8347) use benchmarks branch corresponding to geode branch

2020-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155009#comment-17155009
 ] 

ASF subversion and git services commented on GEODE-8347:


Commit ecd697625bb799d82dc41158d51baa73ede97bf8 in geode's branch 
refs/heads/support/1.13 from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=ecd6976 ]

GEODE-8347: use same benchmarks branch as geode branch, since that's the way we 
release (#5361)

also balance max_in_flight according to how long each job takes

(cherry picked from commit daa70d729b98f8edc3791a0dfccbf102ab94dd94)


> use benchmarks branch corresponding to geode branch
> ---
>
> Key: GEODE-8347
> URL: https://issues.apache.org/jira/browse/GEODE-8347
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Owen Nichols
>Assignee: Owen Nichols
>Priority: Major
> Fix For: 1.14.0
>
>
> Geode 1.12 release included geode-benchmarks from support/1.12, but the 
> pipeline definition is still using benchmarks from develop, as is 1.13.  Fix 
> to use matching branch names between geode and geode-examples.  Also need to 
> rebalance max_in_flight based on how long each benchmark job takes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155169#comment-17155169
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   ` private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }
   `
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155168#comment-17155168
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   ``
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }
   ``





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155166#comment-17155166
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155167#comment-17155167
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   `
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }
   `





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155171#comment-17155171
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   `` private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }``
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155173#comment-17155173
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   ``
   private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }
   ``
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155170#comment-17155170
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

agingade commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r452538829



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   The value for "isDurable" is passed by the caller. If you look into the 
only caller of this method; it calls the "recoverCQs" twice if the client is 
durable, with isDurable value as true. Not sure why its doing twice...
   
   This method is also called while satisfying the redundancy-level, which is 
not related to client durability.
   Say if the redundancy is set to 5 and there are only 3 servers available; 
when a new server is added to the cluster this code is executed to satisfy the 
redundancy. You could try adding test scenario for this.
   
   Also, isDurable is the meta-info sent to server to say if its durable client 
(in this context).
   
   In QueueManagerImpl can you try changing the following method:
   ` private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   recoverInterestList(recoveredConnection, false, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, false, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, false);
   if (getPool().isDurableClient()) {
 recoverInterestList(recoveredConnection, true, true, 
isFirstNewConnection);
 recoverInterestList(recoveredConnection, true, false, 
isFirstNewConnection);
 recoverCqs(recoveredConnection, true);
   }
 }
   
   TO:
   
private void recoverAllInterestTypes(final Connection recoveredConnection,
 boolean isFirstNewConnection) {
   if (PoolImpl.BEFORE_RECOVER_INTEREST_CALLBACK_FLAG) {
 ClientServerObserver bo = ClientServerObserverHolder.getInstance();
 bo.beforeInterestRecovery();
   }
   
   boolean isDurableClient = getPool().isDurableClient();
   recoverInterestList(recoveredConnection, isDurableClient, true, 
isFirstNewConnection);
   recoverInterestList(recoveredConnection, isDurableClient, false, 
isFirstNewConnection);
   recoverCqs(recoveredConnection, isDurableClient);
 }`
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable CQ not registered as durable after server failover
> --
>
> Key: GEODE-8329
> URL: https://issues.apache.org/jira/browse/GEODE-8329
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> {color:#172b4d}It seems that aftter server failover the java client is 
> wrongly re-registering CQ on new server as not durable. Command *list 
> durable-cq* prints that there are no durable CQ which is correct, since CQ is 
> wrongly registered by client as not durable and therefore following 
> printout:{color}
> {code:java}
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> server1 | OK  | randomTracker
> server2 | IGNORED | No client found with client-id : AppCounters
> server3 | IGNORED | No client found with client-id : AppCounters
>  
> after shutdown of server1:
>  
> gfsh>list durable-cqs --durable-client-id=AppCounters
> Member | Status | CQ Name
> --- | --- | 
> ---
> server2 | IGNORED | No durable cqs found for durable-client-id : 
> "AppCounters". --> server2 is hosting CQ, but it is not flagged as durable
> server3 | IGNORED | No client found with client-id : AppCounters{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2