[ 
https://issues.apache.org/jira/browse/YARN-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885650#comment-17885650
 ] 

ASF GitHub Bot commented on YARN-11732:
---------------------------------------

TaoYang526 commented on code in PR #7065:
URL: https://github.com/apache/hadoop/pull/7065#discussion_r1779890901


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java:
##########
@@ -757,10 +757,9 @@ private void completeOustandingUpdatesWhichAreReserved(
       RMContainer rmContainer, ContainerStatus containerStatus,
       RMContainerEventType event) {
     N schedulerNode = getSchedulerNode(rmContainer.getNodeId());
-    if (schedulerNode != null &&
-        schedulerNode.getReservedContainer() != null) {
+    if (schedulerNode != null) {
       RMContainer resContainer = schedulerNode.getReservedContainer();
-      if (resContainer.getReservedSchedulerKey() != null) {
+      if (resContainer != null && resContainer.getReservedSchedulerKey() != 
null) {

Review Comment:
   Thanks @zeekling for the review.
   I'm not sure why your environment still reported NPE after changing like 
that, `resContainer.getReservedSchedulerKey()` won't throw NPE any more with 
the previous not-null check: `if resContainer is null`. Could you please attach 
some details of the NPE and your changes?
   For this change,  I think NPE should be fixed instead of be caught in 
general.





> Potential NPE when calling SchedulerNode#reservedContainer for 
> CapacityScheduler
> --------------------------------------------------------------------------------
>
>                 Key: YARN-11732
>                 URL: https://issues.apache.org/jira/browse/YARN-11732
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.4.0, 3.2.4, 3.3.6, 3.5.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>              Labels: pull-request-available
>
> I found some places calling *SchedulerNode#getReservedContainer* to get 
> reservedContainer (returned value) but not do sanity(not-null) check before 
> calling internal methods of it, which can have a risk to raise 
> NullPointerException if it's null.
> Most of these places have a premise that node has reserved container a few 
> moments ago, but may getting null by calling 
> *SchedulerNode#getReservedContainer* in the next moment, since the 
> reservedContainer can be updated to null concurrently in scheduling and 
> monitoring(preemption) thread. So that not-null check should be done before 
> calling internal methods of the reservedContainer.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to