zpinto commented on code in PR #2961:
URL: https://github.com/apache/helix/pull/2961#discussion_r1842753309
##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -136,6 +136,40 @@ public ObjectNode
getStoppableInstancesCrossZones(List<String> instances,
return result;
}
+ /**
+ * Evaluates and collects stoppable instances without respecting the zone
order.
+ * The method iterates through instances, performing stoppable checks, and
records reasons for
+ * non-stoppability.
+ *
+ * @param instances A list of instance to be evaluated.
+ * @param toBeStoppedInstances A list of instances presumed to be already
stopped
+ * @return An ObjectNode containing:
+ * - 'stoppableNode': List of instances that can be stopped.
+ * - 'instance_not_stoppable_with_reasons': A map with the instance
name as the key and
+ * a list of reasons for non-stoppability as the value.
+ * @throws IOException
+ */
+ public ObjectNode getStoppableInstancesWithoutTopology(List<String>
instances,
+ List<String> toBeStoppedInstances) throws IOException {
+ ObjectNode result = JsonNodeFactory.instance.objectNode();
+ ArrayNode stoppableInstances =
+
result.putArray(InstancesAccessor.InstancesProperties.instance_stoppable_parallel.name());
+ ObjectNode failedStoppableInstances = result.putObject(
+
InstancesAccessor.InstancesProperties.instance_not_stoppable_with_reasons.name());
+ Set<String> toBeStoppedInstancesSet = new HashSet<>(toBeStoppedInstances);
+ collectEvacuatingInstances(toBeStoppedInstancesSet);
Review Comment:
Should included evacuating, unknown, or swap in because we don't want the
case where a replica on an evacuating, unknown, or swap-in instance allows
another instance that is enabled to be stoppable. By the time the instance that
was reported as stoppable gets taken down, the replica could have been dropped
from the evacuating or unknown instance. Also, the swap-in replica is not
actually serving if the use case is using routing table provider and it cannot
ever become top-state until the swap is finished.
Let's discuss more offline. I think there may be a better way to account for
these edge cases/race conditions without explicitly marking instances as
toBeStopped based on their InstanceOperation
##########
helix-rest/src/test/java/org/apache/helix/rest/server/AbstractTestClass.java:
##########
@@ -711,6 +713,62 @@ private void
preSetupForCrosszoneParallelInstancesStoppableTest(String clusterNa
_clusters.add(clusterName);
_workflowMap.put(clusterName, createWorkflows(clusterName, 3));
}
+
+ private void preSetupForNonTopoAwareInstancesStoppableTest(String
clusterName,
+ List<String> instances) throws Exception {
+ _gSetupTool.addCluster(clusterName, true);
+ ClusterConfig clusterConfig =
_configAccessor.getClusterConfig(clusterName);
+ clusterConfig.setFaultZoneType("helixZoneId");
Review Comment:
Based on my reading of the ClusterTopology class, I don't think that the
changes in this PR will work in the case that clusterConfig does not have fault
zone type set. We need to make sure that stoppable check works for clusters
that DON'T have TOPOLOGY, TOPOLOGY_AWARE_ENABLED, and FAULT_ZONE_TYPE configs
set.
Please add tests for this case and make the necessary changes to support
this.
##########
helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstancesAccessor.java:
##########
@@ -285,6 +294,14 @@ private Response batchGetStoppableInstances(String
clusterId, JsonNode node, boo
}
}
+ if (selectionBase != InstanceHealthSelectionBase.non_topo_based &&
clusterTopology.getZones()
Review Comment:
Why do we need the user to explicitly provide a sectionBase? Can't we
simplify this API by figuring out the InstanceHealthSelectionBase internally?
if orderOfZone is provided we use zone_based
if necessary topology configs are set, we use cross_zone_based
else we use non_topo_based
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]