[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-22 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r812174465



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   > @albertogpz I have not considered working on any of them at the 
moment. So feel free to pick up any of them. If you feel you have gotten stuck 
or need to chat to anyone about this, please feel free to reach out to me 
directly.
   > 
   > [GEODE-10071](https://issues.apache.org/jira/browse/GEODE-10071) and 
[GEODE-10068](https://issues.apache.org/jira/browse/GEODE-10068) should be 
reasonably simple and to apply
   > 
   > [GEODE-10067](https://issues.apache.org/jira/browse/GEODE-10067) and 
[GEODE-10069](https://issues.apache.org/jira/browse/GEODE-10069) would be good 
tasks to complete to improve overall simplicity and performance of the current 
implementation
   > 
   > [GEODE-10070](https://issues.apache.org/jira/browse/GEODE-10070) is a task 
that we need to carefully consider how it is implemented. Please feel free to 
reach out to the "experts" on this via the dev-list. Completing this task will 
be a huge step in the right direction when it comes to SoC and modularity
   
   @kohlmu-pivotal I will try to find some time to work on some of these items.
   I consider [GEODE-10067](https://issues.apache.org/jira/browse/GEODE-10067) 
and [GEODE-10069](https://issues.apache.org/jira/browse/GEODE-10069) the most 
important ones.
   I was thinking that maybe the first one could be solved by the proposal on 
the second one, i.e., calling `onRegion()` from one of the members so that when 
the function is executed in the corresponding member it could iterate over the 
keys in the filter and copy the entries by getting them without deserializing 
them.

   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-22 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r812120574



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   > @agingade, let me try to explain why this is not an bug in the command 
but a problem with timing in the test case.
   > 
   > At the beginning, the test puts 5 entries whose keys are in the range 
of 0-4.
   > 
   > Then, the wan-copy region command is executed and, in parallel, other puts 
and deletes are done. The keys for these entries puts/deleted are in the range 
of 5-52000 (no intersection with the first entries put).
   > 
   > The test tries to verify that only the entries put at the beginning are 
copied (none of the entries put while the command is running). It does so by 
verifying that the number of entries copied is exactly 5 as reported by the 
command. If any entry put while the command is running is copied by the 
command, we should see that the number of entries copied is greater than 5, 
which is what we see when the test fails (50001). But the test needs to make 
sure that the puts for the 5-52000 range are done after the copy command 
has started and not before. Otherwise, we could get a failure of the test 
(which is what we see sometimes).
   > 
   > The way to make sure that the second group of puts starts after the 
command has started is by means of the `await()` you mention. But when the 
`await()` returns, it only guarantees that the `Callable` that will execute the 
command has been submitted to the pool (see 
`WanCopyRegionFunctionService.execute()`) although the command may not have 
effectively started.
   > 
   > At that point, random operations (puts and deletes) can be started. Let's 
say at t=x, entry with key=5 is put.
   > 
   > The time at which the Callable object that in turn calls the 
`WanCopyRegionFunctionDelegate.wanCopyRegion()`is executed depends on the 
scheduling. Let's say it executes its first line at t=x + 1. At that point, the 
current time is recorded as (`functionStartTimestamp`). and a sleep of 500ms 
will be done before entries are started to be copied.
   > 
   > Then the command starts to copy entries. Any entry with a timestamp 
greater than `functionStartTimestamp` will not be copied. In this case, the 
timestamp for entry with key=5 is x and given that `functionStartTimestamp` 
is x+1, the entry will be copied. This would not point to an bug in the 
command. Rather, that the entry with key=5 was put before the command was 
started.
   > 
   > With the 100ms extra added to the wait we can be more certain that second 
group of puts will be started after the `functionStartTimestamp` is recorded 
and therefore, the timestamp for the entries of the second group of puts will 
be higher than the `functionStartTimestamp`.
   
   @agingade I have removed the sleep and changed the check to avoid the 
flakyness by looking at the region size at the remote site.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-22 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r812112607



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   > @albertogpz a possible way to test this is to confirm that the two 
clusters have the same entries, rather than comparing against a counter that is 
constantly moving
   
   @kohlmu-pivotal I have changed the way to check that the command started 
copying entries by looking at the size of the remote site. If the remote site 
has at least one entry, it means that the command already started.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-21 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r811346516



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   @kohlmu-pivotal Thanks for the improvements suggested. Are you planning 
on taking any of them? I am saying in order to tackle whichever you are not 
taking or all if you take none.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-21 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r811345803



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   Thanks for the comment, @kohlmu-pivotal . I will try to think about a 
different approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-16 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r808736289



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   @agingade Did my explanation answer your concerns?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

2022-02-01 Thread GitBox


albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r796909436



##
File path: 
geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, 
List servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean 
usePartitionedRegion,
-  List servers) {
+  List servers) throws InterruptedException {
+// Wait for the command execution to be registered in the service
 final int executionsExpected = useParallel && usePartitionedRegion ? 
servers.size() : 1;
 await().untilAsserted(
 () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
 .isEqualTo(executionsExpected));
+// Wait some extra milliseconds to allow for the command to actually start 
to
+// avoid flakyness in the tests.
+Thread.sleep(100);

Review comment:
   @agingade, let me try to explain why this is not an bug in the command 
but a problem with timing in the test case.
   
   At the beginning, the test puts 5 entries whose keys are in the range of 
0-4.
   
   Then, the wan-copy region command is executed and, in parallel, other puts 
and deletes are done. The keys for these entries puts/deleted are in the range 
of 5-52000 (no intersection with the first entries put).
   
   The test tries to verify that only the entries put at the beginning are 
copied (none of the entries put while the command is running). It does so by 
verifying that the number of entries copied is exactly 5 as reported by the 
command. If any entry put while the command is running is copied by the 
command, we should see that the number of entries copied is greater than 5, 
which is what we see when the test fails (50001). But the test needs to make 
sure that the puts for the 5-52000 range are done after the copy command 
has started and not before. Otherwise, we could get a failure of the test 
(which is what we see sometimes).
   
   The way to make sure that the second group of puts starts after the command 
has started is by means of the `await()` you mention. But when the `await()` 
returns, it only guarantees that the `Callable` that will execute the command 
has been submitted to the pool (see `WanCopyRegionFunctionService.execute()`) 
although the command may not have effectively started.
   
   At that point, random operations (puts and deletes) can be started. Let's 
say at t=x, entry with key=5 is put.
   
   The time at which the Callable object that in turn calls the 
`WanCopyRegionFunctionDelegate.wanCopyRegion()`is executed depends on the 
scheduling. Let's say it executes its first line at t=x + 1. At that point, the 
current time is recorded as (`functionStartTimestamp`). and a sleep of 500ms 
will be done before entries are started to be copied.
   
   Then the command starts to copy entries. Any entry with a timestamp greater 
than `functionStartTimestamp` will not be copied. In this case, the timestamp 
for entry with key=5 is x and given that `functionStartTimestamp` is x+1, 
the entry will be copied. This would not point to an bug in the command. 
Rather, that the entry with key=5 was put before the command was started.
   
   With the 100ms extra added to the wait we can be more certain that second 
group of puts will be started after the `functionStartTimestamp` is recorded 
and therefore, the timestamp for the entries of the second group of puts will 
be higher than the `functionStartTimestamp`.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org