zpinto opened a new pull request, #2451:
URL: https://github.com/apache/helix/pull/2451

   ### Issues
   
   - [x] Fixes #2404 
   - [x] Fixes #2229 
   - [x] Fixes #1979 
   - [x] Fixes #1832
   
   (#200 - Link your issue number here: You can write "Fixes #XXX". Please use 
the proper keyword so that the issue gets closed automatically. See 
https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
   Any of the following keywords can be used: close, closes, closed, fix, 
fixes, fixed, resolve, resolves, resolved)
   
   ### Description
   
   #### Fixing Flaky Unit Test TestClusterAccessor#testClusterFreeze
   
   #### What is this test doing?

   
   This integration test is making a POST request to helix-rest to place 
cluster TestCluster_0 into CLUSTER_FREEZE status. It then attempts to verify 
that the following GET request for clusterStatus is in CLUSTER_FREEZE mode.
   
   #### Why is this test likely flaky?
   
   Following the POST request, we check that the `pauseSignal` which we added 
exists under the CONTROLLER znode, this will succeed because this znode is 
written to before the POST responds.
   
   The next test is to see if the `clusterMode` is in `CLUSTER_FREEZE`. We also 
check if the `clusterStatus` is either `IN_PROGRESS` or `COMPLETED`(meaning all 
state transitions are either canceled or completed and participants are now 
frozen/paused). The reason the GET request can sometimes return `clusterMode == 
NORMAL` is because the verification we are doing before we make the GET request 
is not a definitive signal that the `PAUSE` signal has caused the `clusterMode` 
to change to `CLUSTER_FREEZE`. This is because a `clusterPause` signal/event 
has to be enqueued to the `_managementModeEventQueue` and processed before we 
make the GET for `clusterStatus`. Until it has been processed, the 
`clusterStatus` will be NORMAL.
   
   Previously we were running this verification to check if the clusterStatus 
znode existed.
   
   ```
   TestHelper.verify(() -> dataAccessor.getBaseDataAccessor()
           .exists(dataAccessor.keyBuilder().clusterStatus().getPath(), 
AccessOption.PERSISTENT),
       TestHelper.WAIT_DURATION);
   ```
   
   This will not necessarily indicate that the pause event has been processed, 
as there could be an earlier event passed through the 
`_managementEventPipeline` that causes the znode to be created with 
`clusterMode == NORMAL`. This would lead us to make the GET request too early 
and the test to fail here:
   
   ```
   Assert.assertEquals(responseMap.get("mode"), 
ClusterManagementMode.Type.CLUSTER_FREEZE.name());
   ```
   
   Instead, we will verify that the `clusterStatus` znode has `CLUSTER_FREEZE` 
mode before we make the GET request. This will ensure that the pause event was 
already processed. 
   
   ### Tests
   
   - [x] Fix TestClusterAccessor#testClusterFreeze
   
   (List the names of added unit/integration tests)
   
   - The following is the result of the "mvn test" command on the appropriate 
module:
   
   TBD
   
   (If CI test fails due to known issue, please specify the issue and test PR 
locally. Then copy & paste the result of "mvn test" to here.)
   
   ### Changes that Break Backward Compatibility (Optional)
   
   NA
   
   
   
   ### Documentation (Optional)
   
   NA
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their 
subject lines. In addition, my commits follow the guidelines from "[How to 
write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to