jiajunwang commented on a change in pull request #639: Refine the WAGED 
rebalancer to minimize the partial rebalance workload.
URL: https://github.com/apache/helix/pull/639#discussion_r352878366
 
 

 ##########
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterModelProvider.java
 ##########
 @@ -43,7 +45,83 @@
  */
 public class ClusterModelProvider {
 
+  private enum RebalanceScopeType {
+    // Set the rebalance scope to cover the difference between the current 
assignment and the
+    // Baseline assignment only.
+    PARTIAL,
+    // Set the rebalance scope to cover all replicas that need relocation 
based on the cluster
+    // changes.
+    GLOBAL
+  }
+
+  /**
+   * Generate a new Cluster Model object according to the current cluster 
status for partial
+   * rebalance. The rebalance scope is configured for recovering the missing 
replicas only.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be 
rebalanced. Note that any
+   *                               resources that are not in this list will be 
removed from the
+   *                               final assignment.
+   * @param activeInstances        The active instances that will be used in 
the calculation.
+   *                               Note this list can be different from the 
real active node list
+   *                               according to the rebalancer logic.
+   * @param baselineAssignment     The persisted Baseline assignment.
+   * @param bestPossibleAssignment The persisted Best Possible assignment that 
was generated in the
+   *                               previous rebalance.
+   * @return
+   */
+  public static ClusterModel generateClusterModelForPartialRebalance(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> 
resourceMap,
+      Set<String> activeInstances, Map<String, ResourceAssignment> 
baselineAssignment,
+      Map<String, ResourceAssignment> bestPossibleAssignment) {
+    return generateClusterModel(dataProvider, resourceMap, activeInstances, 
Collections.emptyMap(),
+        baselineAssignment, bestPossibleAssignment, 
RebalanceScopeType.PARTIAL);
+  }
+
+  /**
+   * Generate a new Cluster Model object according to the current cluster 
status for the Baseline
+   * calculation. The rebalance scope is determined according to the cluster 
changes.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be 
rebalanced. Note that any
+   *                               resources that are not in this list will be 
removed from the
+   *                               final assignment.
+   * @param activeInstances        The active instances that will be used in 
the calculation.
+   *                               Note this list can be different from the 
real active node list
+   *                               according to the rebalancer logic.
+   * @param clusterChanges         All the cluster changes that happened after 
the previous rebalance.
+   * @param baselineAssignment     The persisted Baseline assignment.
+   * @param bestPossibleAssignment The persisted Best Possible assignment that 
was generated in the
+   *                               previous rebalance.
+   * @return the new cluster model
+   */
+  public static ClusterModel generateClusterModelForBaseline(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> 
resourceMap,
+      Set<String> activeInstances, Map<HelixConstants.ChangeType, Set<String>> 
clusterChanges,
+      Map<String, ResourceAssignment> baselineAssignment,
+      Map<String, ResourceAssignment> bestPossibleAssignment) {
+    return generateClusterModel(dataProvider, resourceMap, activeInstances, 
clusterChanges,
+        baselineAssignment, bestPossibleAssignment, RebalanceScopeType.GLOBAL);
+  }
+
+  /**
+   * Generate a cluster model based on the current state output and data 
cache. The rebalance scope
+   * is configured for recovering the missing replicas only.
+   * @param dataProvider           The controller's data cache.
+   * @param resourceMap            The full list of the resources to be 
rebalanced. Note that any
+   *                               resources that are not in this list will be 
removed from the
+   *                               final assignment.
+   * @param existingAssignment The resource assignment built from current 
state output.
+   * @return the new cluster model
+   */
+  public static ClusterModel generateClusterModelFromExistingAssignment(
+      ResourceControllerDataProvider dataProvider, Map<String, Resource> 
resourceMap,
+      Map<String, ResourceAssignment> existingAssignment) {
+    return generateClusterModel(dataProvider, resourceMap, 
dataProvider.getEnabledLiveInstances(),
+        Collections.emptyMap(), Collections.emptyMap(), existingAssignment,
+        RebalanceScopeType.GLOBAL);
 
 Review comment:
   Here's the tricky part. There are only 2 sets of computing logic here. But 
we have 3 methods...
   The key difference is whether we ignore the unknown replica or not. Let me 
update the generateClusterModelForPartialRebalance description.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to