xyuanlu commented on code in PR #2772:
URL: https://github.com/apache/helix/pull/2772#discussion_r1513583433


##########
helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java:
##########
@@ -327,21 +318,28 @@ public Object call() {
   // if yes, auto enable maintenance mode, and use the maintenance rebalancer 
for this pipeline.
   private boolean validateOfflineInstancesLimit(final 
ResourceControllerDataProvider cache,
       final HelixManager manager) {
-    int maxOfflineInstancesAllowed = 
cache.getClusterConfig().getMaxOfflineInstancesAllowed();
-    if (maxOfflineInstancesAllowed >= 0) {
-      int offlineCount =
-          cache.getAssignableInstances().size() - 
cache.getAssignableEnabledLiveInstances().size();
-      if (offlineCount > maxOfflineInstancesAllowed) {
+    int maxInstancesUnableToAcceptOnlineReplicas =
+        cache.getClusterConfig().getMaxOfflineInstancesAllowed();
+    if (maxInstancesUnableToAcceptOnlineReplicas >= 0) {
+      // Instead of only checking the offline instances, we consider how many 
instances in the cluster
+      // are not assignable and live. This is because some instances may be 
online but have an unassignable
+      // InstanceOperation such as EVACUATE, DISABLE, or UNKNOWN. We will 
exclude SWAP_IN instances from

Review Comment:
   We may add capacity with UNKNOWN instance operation. Probably consider not 
include UNKNOWN as down instances when auto EMM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to