[ 
https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139274#comment-14139274
 ] 

Vinod Kumar Vavilapalli commented on YARN-2567:
-----------------------------------------------

This is a fallout of YARN-2001. We need to (may be lazily) persist the list of 
active/lost/decommissioned nodes for RM to have some idea of the cluster-state 
beyond restart.

YARN-2047 can likely be fixed together with this.

YARN-1071 originally highlighted the problem but we took a short-cut there..

> Add a percentage-node threshold for RM to wait for new allocations after 
> restart/failover
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-2567
>                 URL: https://issues.apache.org/jira/browse/YARN-2567
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> This is the remaining part of YARN-2001 - to halt allocations after restart 
> till x% of nodes sync back with the RM. This is useful for avoiding bad 
> scheduling during the time the nodes are still joining back after a 
> restart/failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to