[ https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139274#comment-14139274 ]
Vinod Kumar Vavilapalli commented on YARN-2567: ----------------------------------------------- This is a fallout of YARN-2001. We need to (may be lazily) persist the list of active/lost/decommissioned nodes for RM to have some idea of the cluster-state beyond restart. YARN-2047 can likely be fixed together with this. YARN-1071 originally highlighted the problem but we took a short-cut there.. > Add a percentage-node threshold for RM to wait for new allocations after > restart/failover > ----------------------------------------------------------------------------------------- > > Key: YARN-2567 > URL: https://issues.apache.org/jira/browse/YARN-2567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Vinod Kumar Vavilapalli > Assignee: Vinod Kumar Vavilapalli > > This is the remaining part of YARN-2001 - to halt allocations after restart > till x% of nodes sync back with the RM. This is useful for avoiding bad > scheduling during the time the nodes are still joining back after a > restart/failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)