GrantPSpencer commented on code in PR #3010:
URL: https://github.com/apache/helix/pull/3010#discussion_r2027712227


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -201,6 +201,12 @@ public synchronized boolean 
checkAndReduceInstanceCapacity(String instance, Stri
       return true;
     }
 
+    if (!_instanceCapacityMap.containsKey(instance)) {
+      LOG.error("Instance: " + instance + " not found in instance capacity 
map. Cluster may be using previous "
+          + "idealState that includes an instance that is no longer part of 
the cluster.");
+      return false;
+    }
+

Review Comment:
   Throwing a helix exception at `computeBestPossibleStates` will just lead to 
falling back to the previous best possible. However, throwing an exception here 
would prevent computeBestPossibleStateForPartition from assigning correct 
states based off the previous best possible we fell back to. Stack trace below
   
   We'd need to add error handling around the computeBestPossiblePartitionState 
call and have some fallback mechanism
   
   ```
   3732 
[HelixController-pipeline-default-TestWagedNPE_cluster-(70412709_DEFAULT)] 
ERROR org.apache.helix.controller.GenericHelixController [] - Exception while 
executing DEFAULT pipeline for cluster TestWagedNPE_cluster. Will not continue 
to next pipeline
   org.apache.helix.HelixException: Instance: localhost_0 not found in instance 
capacity map. Cluster may be using previous idealState that includes an 
instance that is no longer part of the cluster.
        at 
org.apache.helix.controller.rebalancer.waged.WagedInstanceCapacity.checkAndReduceInstanceCapacity(WagedInstanceCapacity.java:209)
 ~[classes/:?]
        at 
org.apache.helix.controller.dataproviders.ResourceControllerDataProvider.checkAndReduceCapacity(ResourceControllerDataProvider.java:543)
 ~[classes/:?]
        at 
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossibleStateForPartition(DelayedAutoRebalancer.java:378)
 ~[classes/:?]
        at 
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:271)
 ~[classes/:?]
        at 
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:54)
 ~[classes/:?]
        at 
org.apache.helix.controller.rebalancer.waged.WagedRebalancer.lambda$computeNewIdealStates$0(WagedRebalancer.java:281)
 ~[classes/:?]
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
        at 
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1693) ~[?:?]
        at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
        at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) 
~[?:?]
        at 
java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) ~[?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
~[?:?]
        at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408) 
~[?:?]
        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736) 
~[?:?]
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) 
~[?:?]
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
 ~[?:?]
        at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
        at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661) 
~[?:?]
        at 
org.apache.helix.controller.rebalancer.waged.WagedRebalancer.computeNewIdealStates(WagedRebalancer.java:277)
 ~[classes/:?]
        at 
org.apache.helix.controller.stages.BestPossibleStateCalcStage.computeResourceBestPossibleStateWithWagedRebalancer(BestPossibleStateCalcStage.java:445)
 ~[classes/:?]
        at 
org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:289)
 ~[classes/:?]
        at 
org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:94)
 ~[classes/:?]
        at 
org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:75) 
~[classes/:?]
        at 
org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:905)
 [classes/:?]
        at 
org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1556)
 [classes/:?]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to