zpinto commented on code in PR #2680:
URL: https://github.com/apache/helix/pull/2680#discussion_r1373474902
##########
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java:
##########
@@ -370,7 +370,8 @@ public static boolean isInstanceStable(HelixDataAccessor
dataAccessor, String in
* @param instanceName
* @return
*/
- public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor
dataAccessor, String instanceName) {
+ public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor
dataAccessor,
+ String instanceName, Set<String> toBeStoppedInstances) {
Review Comment:
It is backwards incompatible to change the signature of a public method.
We should keep a `siblingNodesActiveReplicaCheck(HelixDataAccessor
dataAccessor, String instanceName)` and have it call
`siblingNodesActiveReplicaCheck(HelixDataAccessor dataAccessor,
String instanceName, Set<String> toBeStoppedInstances)` passing empty
map for toBeStoppedInstances.
##########
helix-core/src/test/java/org/apache/helix/util/TestInstanceValidationUtil.java:
##########
@@ -428,7 +502,7 @@ public void TestSiblingNodesActiveReplicaCheck_fail() {
.getProperty(argThat(new
PropertyKeyArgument(PropertyType.STATEMODELDEFS)));
boolean result =
-
InstanceValidationUtil.siblingNodesActiveReplicaCheck(mock.dataAccessor,
TEST_INSTANCE);
+
InstanceValidationUtil.siblingNodesActiveReplicaCheck(mock.dataAccessor,
TEST_INSTANCE, Collections.emptySet());
Review Comment:
This shouldn't have to be changed if backwards compatibility is preserved.
Refer to previous comment.
##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/MaintenanceManagementService.java:
##########
@@ -339,17 +339,17 @@ private List<OperationInterface>
getAllOperationClasses(List<String> operations)
*/
public StoppableCheck getInstanceStoppableCheck(String clusterId, String
instanceName,
String jsonContent) throws IOException {
- return batchGetInstancesStoppableChecks(clusterId,
ImmutableList.of(instanceName), jsonContent)
+ return batchGetInstancesStoppableChecks(clusterId,
ImmutableList.of(instanceName), jsonContent, Collections.emptySet())
.get(instanceName);
}
public Map<String, StoppableCheck> batchGetInstancesStoppableChecks(String
clusterId,
- List<String> instances, String jsonContent) throws IOException {
+ List<String> instances, String jsonContent, Set<String>
toBeStoppedInstances) throws IOException {
Review Comment:
Changing method signature of public method breaks backwards compatibility.
##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -87,8 +120,14 @@ public void getStoppableInstancesInSingleZone(List<String>
instances) throws IOE
}
} else {
_stoppableInstances.add(instance);
Review Comment:
I think that the way the selector works can be confusing to another user. My
intuition would be, I use the builder to set the configuration for how the
selector will select instances. Once I do this, all the attributes of the
selector object are immutable.
I should then be able to call something like, getStoppableResult which can
return an object of type StoppableResult. Then my selector instance can be
reused over an over again. It seems very weird to pass a reference to an empty
ArrayList and then call methods which have side effects to modify that
ArrayList by reference.
Just my personal opinion, curious for input from others.
##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/MaintenanceManagementService.java:
##########
@@ -745,7 +745,7 @@ protected Map<String, Boolean>
getInstanceHealthStatus(String clusterId, String
break;
case MIN_ACTIVE_REPLICA_CHECK_FAILED:
healthStatus.put(HealthCheck.MIN_ACTIVE_REPLICA_CHECK_FAILED.name(),
-
InstanceValidationUtil.siblingNodesActiveReplicaCheck(_dataAccessor,
instanceName));
+
InstanceValidationUtil.siblingNodesActiveReplicaCheck(_dataAccessor,
instanceName, toBeStoppedInstances));
Review Comment:
This is not necessarily a part of the change, but I think we should
re-evaluate whether the `EMPTY_RESOURCE_ASSIGNMENT` should keep an instance
from being stoppable. There could be legitimate reason there is no resource
assignment. Why should that block a deployment or other operation?
##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -87,8 +120,14 @@ public void getStoppableInstancesInSingleZone(List<String>
instances) throws IOE
}
} else {
_stoppableInstances.add(instance);
Review Comment:
Is there a reason to not return _stoppableInstances and not store the state
in the object?
If you instantiate this class once and keep calling `getStoppableInstances`
you will keep adding to `_stoppableInstances` but you will lose the state of
`toBeStoppedInstances` in between calls unless the caller remembers to pass it
on the next call.
If this is used incorrectly without thinking of this, it could lead to
unexpected results.
##########
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java:
##########
@@ -370,7 +370,8 @@ public static boolean isInstanceStable(HelixDataAccessor
dataAccessor, String in
* @param instanceName
* @return
*/
- public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor
dataAccessor, String instanceName) {
+ public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor
dataAccessor,
+ String instanceName, Set<String> toBeStoppedInstances) {
Review Comment:
You also need to take into consideration
`InstanceValidationUtil.perPartitionHealthCheck` method. You should be passing
`toBeStopped` to exclude the replicas on that instance from being checked for
health and being in initState.
This is because the toBeStopped instance could already be undergoing
operation. If that is the case, the status will likely not be in
`globalPartitionHealthStatus` or it could be `false`. Also, it will be in the
OFFLINE state. This case will fail the instances StoppableCheck, if we don't
consider `toBeStopped` instances here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]