zpinto opened a new pull request, #2776: URL: https://github.com/apache/helix/pull/2776
- Fix BestPossibleExternalViewVerifier to use a ZkClient that has the serializer set to ByteArraySerializer so it can read the assignment meta store best possible state. - Fix BestPossibleExternalViewVerifier to actually calculate BEST_POSSIBLE instead of returning last persisted to ZK because we now need to consider handleDelayedRebalanceMinActiveReplica not being persisted to ZK(#2447). - Fix handleDelayedRebalanceMinActiveReplica modifying in-memory _bestPossibleState in the _assignmentMetadataStore which was causing best possible state to continuosly be persisted until handleDelayedRebalanceMinActiveReplica wasn't kicking in anymore. ### Issues - [x] Fix BestPossibleExternalViewVerifier to use a ZkClient that has the serializer set to ByteArraySerializer so it can read the assignment meta store best possible state. - [x] Fix BestPossibleExternalViewVerifier to actually calculate BEST_POSSIBLE instead of returning last persisted to ZK because we now need to consider handleDelayedRebalanceMinActiveReplica not being persisted to ZK(#2447). - [x] Fix handleDelayedRebalanceMinActiveReplica modifying in-memory _bestPossibleState in the _assignmentMetadataStore which was causing best possible state to continuosly be persisted until handleDelayedRebalanceMinActiveReplica wasn't kicking in anymore. ### Description PR #2180 introduced a bug to BestPossibleExternalViewVerifier by reusing the zkClient passed to the builder. This can be an issue because the serializer is often set to ZNRecordSerializer instead of ByteArraySerializer. This failing verifier was not accurate since it was unable to read from the persisted best possible state in the assignment metadata store. PR #2447 moves the logic for handling min active replicas into the emergency rebalance method. This will make the DryRunWagedRebalancer inaccurate because it would previously just return the ZK persisted best possible state for due to the override of `computeBestPossibleAssignment`. We are removing this method because we need to recompute the idealAssignment in addition to just reading the ZK persisted best possible state in order to retain the overwrites for handling min active replicas. handleDelayedRebalanceMinActiveReplica modifies the currentResourceAssignment map passed into it. Because the reference actually points to map in the assignmentMetaStore, hanledDelayedRebalanceMinActiveReplica is actually modifying the cache. This causes isBestPossibleChanged to evaluate to true whenever hanledDelayedRebalanceMinActiveReplica is kicking in and causes continuous writes of the same best possible produced by partial rebalance over and over. ### Tests NA, issue with hanledDelayedRebalanceMinActiveReplica was caught after fixing BestPossibleExternalViewVerifier. Current tests are sufficient. ### Changes that Break Backward Compatibility (Optional) ### Commits - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Code Quality - My diff has been formatted using helix-style.xml (helix-style-intellij.xml if IntelliJ IDE is used) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
