[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM

Jim Brennan (Jira) Thu, 25 Jun 2020 14:42:54 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145869#comment-17145869
 ]


Jim Brennan commented on YARN-9809:
-----------------------------------

Thanks for the updates [~ebadger]!   I have one comment on the new patch:

RMNodeImpl
* I think there's a bug from moving the call to 
\{{ClusterMetrics.getMetrics().incrNumActiveNodes()}}.  If previousRMNode != 
null (in the first check), we call \{{rmNode.updateMetricsForRejoinedNode()}}, 
which decrements the counter for the previous state and increments num active 
nodes. With your change, we now increment active nodes again when we call 
reportNodeRunning.

> NMs should supply a health status when registering with RM
> ----------------------------------------------------------
>
>                 Key: YARN-9809
>                 URL: https://issues.apache.org/jira/browse/YARN-9809
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-9809.001.patch, YARN-9809.002.patch, 
> YARN-9809.003.patch, YARN-9809.004.patch, YARN-9809.005.patch, 
> YARN-9809.006.patch
>
>
> Currently if the NM registers with the RM and it is unhealthy, it can be 
> scheduled many containers before the first heartbeat. After the first 
> heartbeat, the RM will mark the NM as unhealthy and kill all of the 
> containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM

Reply via email to