[ 
https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201782#comment-17201782
 ] 

Eric Badger commented on YARN-9809:
-----------------------------------

{noformat}
RMNodeImpl#AddNodeTransition#transition
        RMNodeStatusEvent rmNodeStatusEvent =
            new RMNodeStatusEvent(nodeId, nodeStatus);

        NodeHealthStatus nodeHealthStatus =
            updateRMNodeFromStatusEvents(rmNode, rmNodeStatusEvent);

        if (nodeHealthStatus.getIsNodeHealthy()) {
{noformat}
bq. Do we run the risk of nodeHealthStatus being null?

[~epayne], nope we should be fine here. {{nodeHealthStatus}} comes from the 
return value of {{updateRMNodeFromStatusEvents}}. The return value of that 
method comes from {{statusEvent.getNodeHealthStatus()}}. But {{statusEvent}} is 
passed into this method via an argument. On the caller side that argument is 
named {{rmNodeStatusEvent}} and it is craeted a few lines up via the 
RMNodeStatusEvent constructor. The {{nodeStatus}} is set there via the 
constructor and we know it won't be null because we are in the "else" of the 
"if" statement that checked for {{nodeStatus}} being null.

> NMs should supply a health status when registering with RM
> ----------------------------------------------------------
>
>                 Key: YARN-9809
>                 URL: https://issues.apache.org/jira/browse/YARN-9809
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, 
> YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, 
> YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch
>
>
> Currently if the NM registers with the RM and it is unhealthy, it can be 
> scheduled many containers before the first heartbeat. After the first 
> heartbeat, the RM will mark the NM as unhealthy and kill all of the 
> containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to