GrantPSpencer opened a new issue, #2891:
URL: https://github.com/apache/helix/issues/2891
### Describe the bug
Waged pipeline will fail due to NPE during `BestPossibleStateCalcStage` as
it will call `checkAndReduceInstanceCapacity` on an instance that is not in the
`WagedInstanceCapacity`'s `_instanceCapacityMap`. This will occur when the
`WagedInstanceCapacity` is calculated at point A, a new instance is added at
point B, and then at at point C the `WagedInstanceCapacity` is not refreshed to
include this instance during the `CurrentStateComputationStage` The specific
circumstances are detailed below
### To Reproduce
1. Add at least 1 waged enabled resource to a cluster and rebalance so
assignments are made.
2. Drop all resources from the cluster.
3. Add a new instance (`"new_instance"`)to the cluster.
4. Add 1 waged enabled resource to the cluster
5. NPE will occur
This occurs because `"new_instance"` is an assignable instance and is in the
newly calculated preference list. So `checkAndReduceInstanceCapacity` is called
on the instance. However, `WagedInstanceCapacity`'s `_instanceCapacityMap` has
not been updated and therefore has a stale view that does not include
`"new_instance"`
This is because the `skipCapacityCalculation` method (a very effective
optimization) causes the `CurrentStateComputationStage` to not refresh the
cache if there are no resources in the resourceMap. However, the resourceMap
is constructed based off the idealStates in the cluster which does not exist at
this point. When a resource is added, a `ResourceConfigChange` event is first
fired. Afterwards, an `IdealStateChange` will fire. In this case of a new
resource being added, the `CurrentStateComputationStage` will not recalculate
the `WagedInstanceCapacity` as the resourceMap is empty when we encounter a
`ResourceConfigChange` and then we do not recalculate on subsequent
`IdealStateChange`
Adding a WAGED resource to a new cluster does not trigger this NPE because
there is no WagedInstanceCapacity so
```
if (Objects.isNull(cache.getWagedInstanceCapacity())) {
return false;
}
```
will force it to be refreshed.
https://github.com/GrantPSpencer/helix/pull/32
The testcase in this draft PR will fail on master and follows the steps
outlined above.
### Expected behavior
WagedInstanceCapacity should be recalculated in the case of a new resource
being added prior to the BestPossibleStateCalcStage.
### Additional context
```
10539
[HelixController-pipeline-default-TestWagedNPE_cluster-(45df0f8d_DEFAULT)]
ERROR org.apache.helix.controller.GenericHelixController [] - Exception while
executing DEFAULT pipeline for cluster TestWagedNPE_cluster. Will not continue
to next pipeline
java.lang.NullPointerException: null
at
org.apache.helix.controller.rebalancer.waged.WagedInstanceCapacity.checkAndReduceInstanceCapacity(WagedInstanceCapacity.java:206)
~[classes/:?]
at
org.apache.helix.controller.dataproviders.ResourceControllerDataProvider.checkAndReduceCapacity(ResourceControllerDataProvider.java:535)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossibleStateForPartition(DelayedAutoRebalancer.java:377)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:271)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:54)
~[classes/:?]
at
org.apache.helix.controller.rebalancer.waged.WagedRebalancer.lambda$computeNewIdealStates$0(WagedRebalancer.java:281)
~[classes/:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
at
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1692) ~[?:?]
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
~[?:?]
at
java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) ~[?:?]
at
java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:290) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java) ~[?:?]
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
~[?:?]
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
~[?:?]
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
~[?:?]
at
org.apache.helix.controller.rebalancer.waged.WagedRebalancer.computeNewIdealStates(WagedRebalancer.java:277)
~[classes/:?]
at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.computeResourceBestPossibleStateWithWagedRebalancer(BestPossibleStateCalcStage.java:445)
~[classes/:?]
at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:289)
~[classes/:?]
at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:94)
~[classes/:?]
at
org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:75)
~[classes/:?]
at
org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:903)
[classes/:?]
at
org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1554)
[classes/:?]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]