GrantPSpencer opened a new issue, #2973:
URL: https://github.com/apache/helix/issues/2973

   ### Describe the bug
   NPE can occur in `IntermedaiteStateCalcStage` when applying pending messages 
to the `intermediateStateMap`. Specifically, when it tries to apply a message 
with DROPPED toState, it calls .remove(..) on a map that is null
   
   ```
   2024/10/29 01:48:13.046 ERROR [GenericHelixController] 
[HelixController-pipeline-default-CLUSTERNAME-(70ae9461_DEFAULT)] [helix] [] 
Exception while executing DEFAULT pipeline for cluster CLUSTERNAME. Will not 
continue to next pipeline
   java.lang.NullPointerException: null
           at 
org.apache.helix.controller.stages.IntermediateStateCalcStage.lambda$computeIntermediateMap$2(IntermediateStateCalcStage.java:868)
 
~[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at java.util.HashMap.forEach(HashMap.java:1337) ~[?:?]
           at 
org.apache.helix.controller.stages.IntermediateStateCalcStage.computeIntermediateMap(IntermediateStateCalcStage.java:864)
 
~[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at 
org.apache.helix.controller.stages.IntermediateStateCalcStage.computeIntermediatePartitionState(IntermediateStateCalcStage.java:402)
 
~[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at 
org.apache.helix.controller.stages.IntermediateStateCalcStage.compute(IntermediateStateCalcStage.java:180)
 
~[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at 
org.apache.helix.controller.stages.IntermediateStateCalcStage.process(IntermediateStateCalcStage.java:85)
 
~[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at 
org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:75) 
~[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at 
org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:903)
 
[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
           at 
org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1554)
 
[org.apache.helix.helix-core-1.3.2-dev-202404301535-hotfix.jar:1.3.2-dev-202404301535-hotfix]
   ```
   
   
   ```
       for (Map.Entry<Partition, Map<String, Message>> entry : 
pendingMessageMap.entrySet()) {
         entry.getValue().forEach((key, value) -> {
           if (!value.getToState().equals(HelixDefinedState.DROPPED.name())) {
             intermediateStateMap.setState(entry.getKey(), value.getTgtName(), 
value.getToState());
           } else {
             
intermediateStateMap.getStateMap().get(entry.getKey()).remove(value.getTgtName());
           }
         });
   ```
   
   ### To Reproduce
   Unable to reproduce outside of unit tests. Currently I think the behavior 
occurs when:
   
   1. Resource has partition with 1 replica .
   2. Message is sent to instance A to drop replica, but replica does not exist 
in instance's current state anymore.
   3. Controller snapshots cluster and runs pipeline.
   4. IntermediateStateCalc will attempt to call .remove() on a map that does 
not exist 
   
   I think the above state can be reached when:
   
   1. Race condition where node reads the message, drops the current state, but 
hasn't deleted the message yet so it is still seen as a pending message
   2. Node goes offline so there is no current state
   
   ### Expected behavior
   Failing to remove because map is null should not error out in my opinion. 
Can add null check or a getOrDefault to return empty map
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to