GrantPSpencer commented on code in PR #3010:
URL: https://github.com/apache/helix/pull/3010#discussion_r2027712227
##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -201,6 +201,12 @@ public synchronized boolean
checkAndReduceInstanceCapacity(String instance, Stri
return true;
}
+ if (!_instanceCapacityMap.containsKey(instance)) {
+ LOG.error("Instance: " + instance + " not found in instance capacity
map. Cluster may be using previous "
+ + "idealState that includes an instance that is no longer part of
the cluster.");
+ return false;
+ }
+
Review Comment:
Throwing a helix exception at `computeBestPossibleStates` will just lead to
falling back to the previous best possible. However, throwing an exception here
would prevent computeBestPossibleStateForPartition from assigning correct
states based off the previous best possible we fell back to. Stack trace below
We'd need to add error handling around the computeBestPossiblePartitionState
call and have some fallback mechanism
```
3732
[HelixController-pipeline-default-TestWagedNPE_cluster-(70412709_DEFAULT)]
ERROR org.apache.helix.controller.GenericHelixController [] - Exception while
executing DEFAULT pipeline for cluster TestWagedNPE_cluster. Will not continue
to next pipeline
org.apache.helix.HelixException: Instance: localhost_0 not found in instance
capacity map. Cluster may be using previous idealState that includes an
instance that is no longer part of the cluster.
at
org.apache.helix.controller.rebalancer.waged.WagedInstanceCapacity.checkAndReduceInstanceCapacity(WagedInstanceCapacity.java:209)
~[classes/:?]
at
org.apache.helix.controller.dataproviders.ResourceControllerDataProvider.checkAndReduceCapacity(ResourceControllerDataProvider.java:543)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossibleStateForPartition(DelayedAutoRebalancer.java:378)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:271)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:54)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.waged.WagedRebalancer.lambda$computeNewIdealStates$0(WagedRebalancer.java:281)
~[classes/:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
at
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1693) ~[?:?]
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
~[?:?]
at
java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
~[?:?]
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
~[?:?]
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
~[?:?]
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
~[?:?]
at
org.apache.helix.controller.rebalancer.waged.WagedRebalancer.computeNewIdealStates(WagedRebalancer.java:277)
~[classes/:?]
at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.computeResourceBestPossibleStateWithWagedRebalancer(BestPossibleStateCalcStage.java:445)
~[classes/:?]
at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:289)
~[classes/:?]
at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:94)
~[classes/:?]
at
org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:75)
~[classes/:?]
at
org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:905)
[classes/:?]
at
org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1556)
[classes/:?]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]