zpinto commented on code in PR #2635:
URL: https://github.com/apache/helix/pull/2635#discussion_r1337902721


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -54,7 +55,15 @@ public WagedInstanceCapacity(ResourceControllerDataProvider 
clusterData) {
       return;
     }
     for (InstanceConfig instanceConfig : 
clusterData.getInstanceConfigMap().values()) {
-      Map<String, Integer> instanceCapacity = 
WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, 
instanceConfig);
+      Map<String, Integer> instanceCapacity;
+      try {
+        instanceCapacity = 
WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, 
instanceConfig);
+      } catch (HelixException ex) {
+        // We don't want to throw exception here, it would be OK if no 
resource is using Waged.
+        // Waged rebalancer will fail in later pipeline stage only for waged 
resource. So it won't block other resources.
+        LOG.error("Failed to initialize instance capacity map. Instance 
capacity map is not property set up", ex);
+        return;

Review Comment:
   I don't think that it is safe to return here. We can catch the exception but 
we need to add empty InstanceCapacity map to  _instanceCapacityMap for each 
instance.
   
   Consider the following case:
   - Cluster does not use DEFAULT_INSTANCE_CAPACITY_MAP but they populate the 
INSTANCE_CONFIG_WEIGHTS at the InstanceConfig level.
   - Let's assume they also have WAGED resources
   - Everything is working as expected
   - Now lets assume that somehow the INSTANCE_CAPACITY_MAP gets deleted from 
one InstanceConfig
   - If we exit this method early without adding all of the instances to 
_instanceCapacityMap, we will get null pointer exception 
[here](https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java#L192)
 because we are trying to charge capacity 
[here](https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java#L382)
 for instances that may have not been added before returning.
   - I don't think it will be an issue for the instance that is missing 
INSTANCE_CAPACITY_MAP because all assignments to that instance will be moved 
since there is no capacity; therefore, it will never be charged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to