MarkGaox commented on code in PR #2736:
URL: https://github.com/apache/helix/pull/2736#discussion_r1465508076


##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -129,32 +136,53 @@ public ObjectNode 
getStoppableInstancesCrossZones(List<String> instances,
       if (instanceSet.isEmpty()) {
         continue;
       }
-      populateStoppableInstances(new ArrayList<>(instanceSet), 
toBeStoppedInstancesSet, stoppableInstances,
-          failedStoppableInstances);
+      populateStoppableInstances(new ArrayList<>(instanceSet), 
toBeStoppedInstancesSet,
+          stoppableInstances, failedStoppableInstances,
+          _maxAdditionalOfflineInstances - toBeStoppedInstancesSet.size());
     }
     processNonexistentInstances(instances, failedStoppableInstances);
     return result;
   }
 
   private void populateStoppableInstances(List<String> instances, Set<String> 
toBeStoppedInstances,
-      ArrayNode stoppableInstances, ObjectNode failedStoppableInstances) 
throws IOException {
+      ArrayNode stoppableInstances, ObjectNode failedStoppableInstances,
+      int allowedOfflineInstances) throws IOException {
     Map<String, StoppableCheck> instancesStoppableChecks =
         _maintenanceService.batchGetInstancesStoppableChecks(_clusterId, 
instances,
             _customizedInput, toBeStoppedInstances);
 
     for (Map.Entry<String, StoppableCheck> instanceStoppableCheck : 
instancesStoppableChecks.entrySet()) {
       String instance = instanceStoppableCheck.getKey();
       StoppableCheck stoppableCheck = instanceStoppableCheck.getValue();
-      if (!stoppableCheck.isStoppable()) {
-        ArrayNode failedReasonsNode = 
failedStoppableInstances.putArray(instance);
-        for (String failedReason : stoppableCheck.getFailedChecks()) {
-          
failedReasonsNode.add(JsonNodeFactory.instance.textNode(failedReason));
-        }
-      } else {
+      if (stoppableCheck.isStoppable() && allowedOfflineInstances > 0) {
         stoppableInstances.add(instance);
         // Update the toBeStoppedInstances set with the currently identified 
stoppable instance.
         // This ensures that subsequent checks in other zones are aware of 
this instance's stoppable status.
         toBeStoppedInstances.add(instance);
+        allowedOfflineInstances--;
+        continue;
+      }
+      ArrayNode failedReasonsNode = 
failedStoppableInstances.putArray(instance);
+      boolean failedHelixOwnChecks = false;
+      if (allowedOfflineInstances <= 0) {
+        
failedReasonsNode.add(JsonNodeFactory.instance.textNode(EXCEED_MAX_OFFLINE_INSTANCES));
+        failedHelixOwnChecks = true;
+      }
+
+      if (!stoppableCheck.isStoppable()) {
+        for (String failedReason : stoppableCheck.getFailedChecks()) {
+          // HELIX_OWN_CHECK can always be added to the failedReasonsNode.
+          if 
(failedReason.startsWith(StoppableCheck.Category.HELIX_OWN_CHECK.getPrefix())) {
+            
failedReasonsNode.add(JsonNodeFactory.instance.textNode(failedReason));
+            failedHelixOwnChecks = true;
+            continue;
+          }
+          // CUSTOM_INSTANCE_CHECK and CUSTOM_PARTITION_CHECK can only be 
added to the failedReasonsNode
+          // if continueOnFailure is true and there is no failed 
Helix_OWN_CHECKS.
+          if (_continueOnFailure && !failedHelixOwnChecks) {

Review Comment:
   Another way to handle this is to process instances by the amount of 
maxAllowedOffline. Say our maxAllowedOffline = 3, and there are 10 instances in 
the same zone. In the first iteration, we process instance1-3. If the 
cumulative stoppable instances count doesn't exceed maxAllowedOffline, we do 
the next iteration of instances4-6 and so on. But I'm worried about the 
performance in this design because now our API can only batchProcess 
`maxAllowedOffline` number of instances in parallel. If the instanceList is 
super long, our check could take many iteration to be finished.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to