zpinto commented on code in PR #2680:
URL: https://github.com/apache/helix/pull/2680#discussion_r1373474902


##########
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java:
##########
@@ -370,7 +370,8 @@ public static boolean isInstanceStable(HelixDataAccessor 
dataAccessor, String in
    * @param instanceName
    * @return
    */
-  public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor 
dataAccessor, String instanceName) {
+  public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor 
dataAccessor,
+      String instanceName, Set<String> toBeStoppedInstances) {

Review Comment:
   It is backwards incompatible to change the signature of a public method.
   
   We should keep a `siblingNodesActiveReplicaCheck(HelixDataAccessor 
dataAccessor, String instanceName)` and have it call 
`siblingNodesActiveReplicaCheck(HelixDataAccessor dataAccessor,
         String instanceName, Set<String> toBeStoppedInstances)` passing empty 
map for toBeStoppedInstances.



##########
helix-core/src/test/java/org/apache/helix/util/TestInstanceValidationUtil.java:
##########
@@ -428,7 +502,7 @@ public void TestSiblingNodesActiveReplicaCheck_fail() {
         .getProperty(argThat(new 
PropertyKeyArgument(PropertyType.STATEMODELDEFS)));
 
     boolean result =
-        
InstanceValidationUtil.siblingNodesActiveReplicaCheck(mock.dataAccessor, 
TEST_INSTANCE);
+        
InstanceValidationUtil.siblingNodesActiveReplicaCheck(mock.dataAccessor, 
TEST_INSTANCE, Collections.emptySet());

Review Comment:
   This shouldn't have to be changed if backwards compatibility is preserved. 
Refer to previous comment.



##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/MaintenanceManagementService.java:
##########
@@ -339,17 +339,17 @@ private List<OperationInterface> 
getAllOperationClasses(List<String> operations)
    */
   public StoppableCheck getInstanceStoppableCheck(String clusterId, String 
instanceName,
       String jsonContent) throws IOException {
-    return batchGetInstancesStoppableChecks(clusterId, 
ImmutableList.of(instanceName), jsonContent)
+    return batchGetInstancesStoppableChecks(clusterId, 
ImmutableList.of(instanceName), jsonContent, Collections.emptySet())
         .get(instanceName);
   }
 
 
   public Map<String, StoppableCheck> batchGetInstancesStoppableChecks(String 
clusterId,
-      List<String> instances, String jsonContent) throws IOException {
+      List<String> instances, String jsonContent, Set<String> 
toBeStoppedInstances) throws IOException {

Review Comment:
   Changing method signature of public method breaks backwards compatibility.



##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -87,8 +120,14 @@ public void getStoppableInstancesInSingleZone(List<String> 
instances) throws IOE
         }
       } else {
         _stoppableInstances.add(instance);

Review Comment:
   I think that the way the selector works can be confusing to another user. My 
intuition would be, I use the builder to set the configuration for how the 
selector will select instances. Once I do this, all the attributes of the 
selector object are immutable.
   
   I should then be able to call something like, getStoppableResult which can 
return an object of type StoppableResult. Then my selector instance can be 
reused over an over again. It seems very weird to pass a reference to an empty 
ArrayList and then call methods which have side effects to modify that 
ArrayList by reference.
   
   Just my personal opinion, curious for input from others.



##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/MaintenanceManagementService.java:
##########
@@ -745,7 +745,7 @@ protected Map<String, Boolean> 
getInstanceHealthStatus(String clusterId, String
           break;
         case MIN_ACTIVE_REPLICA_CHECK_FAILED:
           healthStatus.put(HealthCheck.MIN_ACTIVE_REPLICA_CHECK_FAILED.name(),
-              
InstanceValidationUtil.siblingNodesActiveReplicaCheck(_dataAccessor, 
instanceName));
+              
InstanceValidationUtil.siblingNodesActiveReplicaCheck(_dataAccessor, 
instanceName, toBeStoppedInstances));

Review Comment:
   This is not necessarily a part of the change, but I think we should 
re-evaluate whether the `EMPTY_RESOURCE_ASSIGNMENT` should keep an instance 
from being stoppable. There could be legitimate reason there is no resource 
assignment. Why should that block a deployment or other operation?



##########
helix-rest/src/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java:
##########
@@ -87,8 +120,14 @@ public void getStoppableInstancesInSingleZone(List<String> 
instances) throws IOE
         }
       } else {
         _stoppableInstances.add(instance);

Review Comment:
   Is there a reason to not return _stoppableInstances and not store the state 
in the object?
   
   If you instantiate this class once and keep calling `getStoppableInstances` 
you will keep adding to `_stoppableInstances` but you will lose the state of 
`toBeStoppedInstances` in between calls unless the caller remembers to pass it 
on the next call.
   
   If this is used incorrectly without thinking of this, it could lead to 
unexpected results.



##########
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java:
##########
@@ -370,7 +370,8 @@ public static boolean isInstanceStable(HelixDataAccessor 
dataAccessor, String in
    * @param instanceName
    * @return
    */
-  public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor 
dataAccessor, String instanceName) {
+  public static boolean siblingNodesActiveReplicaCheck(HelixDataAccessor 
dataAccessor,
+      String instanceName, Set<String> toBeStoppedInstances) {

Review Comment:
   You also need to take into consideration 
`InstanceValidationUtil.perPartitionHealthCheck` method. You should be passing 
`toBeStopped` to exclude the replicas on that instance from being checked for 
health and being in initState.
   
   This is because the toBeStopped instance could already be undergoing 
operation. If that is the case, the status will likely not be in 
`globalPartitionHealthStatus` or it could be `false`. Also, it will be in the 
OFFLINE state. This case will fail the instances StoppableCheck, if we don't 
consider `toBeStopped` instances here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to