junkaixue commented on code in PR #2447:
URL: https://github.com/apache/helix/pull/2447#discussion_r1171623918
##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/DelayedRebalanceUtil.java:
##########
@@ -335,6 +335,42 @@ public static Set<AssignableReplica>
findToBeAssignedReplicasForMinActiveReplica
return toBeAssignedReplicas;
}
+ /**
+ * Merge entries from currentResourceAssignment to newAssignment.
+ * To handle minActiveReplica for delayed rebalance, new assignment is
computed based on enabled live instances, but
+ * could miss out current partition allocation still on offline instances
(within delayed window).
+ * The merge process is independent for each resource; for each
resource-partition, it adds the <instance, state> pair
+ * to newAssignment if it's not there yet; in other word, the entries in
newAssignment won't be override.
+ * @param newAssignment newAssignment to merge, this map is getting updated
during this method.
+ * @param currentResourceAssignment the current resource assignment
+ * @param enabledLiveInstances the set of enabled live instance
+ */
+ public static void mergeAssignments(Map<String, ResourceAssignment>
newAssignment,
+ Map<String, ResourceAssignment> currentResourceAssignment,
+ Set<String> enabledLiveInstances) {
+ // merge with current assignment for partitions assigned on rest of the
instances (not immediately live)
+ currentResourceAssignment.entrySet().parallelStream().forEach(entry -> {
+ String resourceName = entry.getKey();
+ ResourceAssignment currentAssignment = entry.getValue();
+ for (Partition partition : currentAssignment.getMappedPartitions()) {
+ currentAssignment.getReplicaMap(partition).entrySet().stream() //
<instance, state>
+ // the existing partitions on the enabledLiveInstances are
pre-allocated, only process for the rest
+ .filter(e -> !enabledLiveInstances.contains(e.getKey()) ||
+ !newAssignment.containsKey(resourceName) ||
+
!newAssignment.get(resourceName).getReplicaMap(partition).containsKey(e.getKey()))
Review Comment:
I forgot the last PR. When we recompute on the new assignment, was that
partition map only contains the single messed up replica or entire reassigned
partition.
##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java:
##########
@@ -432,6 +479,17 @@ protected Map<String, ResourceAssignment>
emergencyRebalance(
// Step 3: persist result to metadata store
persistBestPossibleAssignment(newAssignment);
+
+ // Step 4: handle delayed rebalance minActiveReplica
+ // Note this result is one step branching from the main calculation and
SHOULD NOT be persisted -- it is temporary,
+ // and only apply during the delayed window of those offline yet active
nodes, a definitive resolution will happen
+ // once the node comes back of remain offline after the delayed window.
+ Map<String, ResourceAssignment> assignmentWithDelayedRebalanceAdjust =
newAssignment;
+ if (_partialRebalanceRunner.isAsyncPartialRebalanceEnabled()) {
+ assignmentWithDelayedRebalanceAdjust =
+ handleDelayedRebalanceMinActiveReplica(clusterData, resourceMap,
activeNodes, newAssignment, algorithm);
+ }
Review Comment:
We changed the check condition. Previous one means if we have delayed nodes,
then we perform the recompute, now it is if delay is enabled then we just
compute it. It would be adding more computing time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]