[ 
https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922542#comment-16922542
 ] 

Eric Badger commented on YARN-9809:
-----------------------------------

bq. Although it is good to have a way to prevent scheduling containers to a 
node manager that is going through registration process to save network round 
trips and compute resources, the existing async design allows the node to show 
up in Resource Manager as quickly as possible to improve system admin user 
experience.

But if that node is bad, then registering to the RM is just adding unnecessary 
work. The NM health check script can check for many things that are known 
without a container being run. For example, docker could not be installed, or 
nscd not running (causing a user lookup for every new container). These could 
be reasons for the node to declare itself as unhealthy depending on the 
specific health check script. If we register with the RM and then declare the 
node unhealthy afterwards then we have to kill every container that was 
scheduled in the period between registration and first heartbeat.

> NMs should supply a health status when registering with RM
> ----------------------------------------------------------
>
>                 Key: YARN-9809
>                 URL: https://issues.apache.org/jira/browse/YARN-9809
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>
> Currently if the NM registers with the RM and it is unhealthy, it can be 
> scheduled many containers before the first heartbeat. After the first 
> heartbeat, the RM will mark the NM as unhealthy and kill all of the 
> containers.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to