zpinto commented on code in PR #2961:
URL: https://github.com/apache/helix/pull/2961#discussion_r1842753309


##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -136,6 +136,40 @@ public ObjectNode 
getStoppableInstancesCrossZones(List<String> instances,
     return result;
   }
 
+  /**
+   * Evaluates and collects stoppable instances without respecting the zone 
order.
+   * The method iterates through instances, performing stoppable checks, and 
records reasons for
+   * non-stoppability.
+   *
+   * @param instances A list of instance to be evaluated.
+   * @param toBeStoppedInstances A list of instances presumed to be already 
stopped
+   * @return An ObjectNode containing:
+   *         - 'stoppableNode': List of instances that can be stopped.
+   *         - 'instance_not_stoppable_with_reasons': A map with the instance 
name as the key and
+   *         a list of reasons for non-stoppability as the value.
+   * @throws IOException
+   */
+  public ObjectNode getStoppableInstancesWithoutTopology(List<String> 
instances,
+      List<String> toBeStoppedInstances) throws IOException {
+    ObjectNode result = JsonNodeFactory.instance.objectNode();
+    ArrayNode stoppableInstances =
+        
result.putArray(InstancesAccessor.InstancesProperties.instance_stoppable_parallel.name());
+    ObjectNode failedStoppableInstances = result.putObject(
+        
InstancesAccessor.InstancesProperties.instance_not_stoppable_with_reasons.name());
+    Set<String> toBeStoppedInstancesSet = new HashSet<>(toBeStoppedInstances);
+    collectEvacuatingInstances(toBeStoppedInstancesSet);

Review Comment:
   Should included evacuating, unknown, or swap in because we don't want the 
case where a replica on an evacuating, unknown, or swap-in instance allows 
another instance that is enabled to be stoppable. By the time the instance that 
was reported as stoppable gets taken down, the replica could have been dropped 
from the evacuating or unknown instance. Also, the swap-in replica is not 
actually serving if the use case is using routing table provider and it cannot 
ever become top-state until the swap is finished.
   
   Let's discuss more offline. I think there may be a better way to account for 
these edge cases/race conditions without explicitly marking instances as 
toBeStopped based on their InstanceOperation



##########
helix-rest/src/test/java/org/apache/helix/rest/server/AbstractTestClass.java:
##########
@@ -711,6 +713,62 @@ private void 
preSetupForCrosszoneParallelInstancesStoppableTest(String clusterNa
     _clusters.add(clusterName);
     _workflowMap.put(clusterName, createWorkflows(clusterName, 3));
   }
+
+  private void preSetupForNonTopoAwareInstancesStoppableTest(String 
clusterName,
+      List<String> instances) throws Exception {
+    _gSetupTool.addCluster(clusterName, true);
+    ClusterConfig clusterConfig = 
_configAccessor.getClusterConfig(clusterName);
+    clusterConfig.setFaultZoneType("helixZoneId");

Review Comment:
   Based on my reading of the ClusterTopology class, I don't think that the 
changes in this PR will work in the case that clusterConfig does not have fault 
zone type set. We need to make sure that stoppable check works for clusters 
that DON'T have TOPOLOGY, TOPOLOGY_AWARE_ENABLED, and FAULT_ZONE_TYPE configs 
set.
   
   Please add tests for this case and make the necessary changes to support 
this.



##########
helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstancesAccessor.java:
##########
@@ -285,6 +294,14 @@ private Response batchGetStoppableInstances(String 
clusterId, JsonNode node, boo
         }
       }
 
+      if (selectionBase != InstanceHealthSelectionBase.non_topo_based && 
clusterTopology.getZones()

Review Comment:
   Why do we need the user to explicitly provide a sectionBase? Can't we 
simplify this API by figuring out the InstanceHealthSelectionBase internally?
   
   if orderOfZone is provided we use zone_based
   if necessary topology configs are set, we use cross_zone_based
   else we use non_topo_based
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to