[ 
https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921761#comment-16921761
 ] 

Eric Yang commented on YARN-9809:
---------------------------------

[~ebadger] LocalDirsHandlerService checkDir is a timer task.  It is low 
probability for the schedule task to complete before registration call happens. 
 Although it is good to have a way to prevent scheduling containers to a node 
manager that is going through registration process to save network round trips 
and compute resources, the existing async design allows the node to show up in 
Resource Manager as quickly as possible to improve system admin user 
experience.  I think there is merits in both approaches, but they seem mutually 
exclusive to each other.  Please shed lights on your plan to keep existing 
responsiveness of registration and prevent containers leaking to bad node.  
Thanks

> NMs should supply a health status when registering with RM
> ----------------------------------------------------------
>
>                 Key: YARN-9809
>                 URL: https://issues.apache.org/jira/browse/YARN-9809
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>
> Currently if the NM registers with the RM and it is unhealthy, it can be 
> scheduled many containers before the first heartbeat. After the first 
> heartbeat, the RM will mark the NM as unhealthy and kill all of the 
> containers.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to