[
https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236701#comment-15236701
]
sandflee commented on YARN-2567:
--------------------------------
The main idea is to lazily store NM status, if RM failover, recover NM status
RUNNING/UNHEALTHY/DECOMMISSIONNING state, recover RMNode to NEW state,
register a timer and wait for the register to become active.
LOST/DECOMMISSIONED/SHUTDOWN state , recover to corresponding finished state.
> Add a percentage-node threshold for RM to wait for new allocations after
> restart/failover
> -----------------------------------------------------------------------------------------
>
> Key: YARN-2567
> URL: https://issues.apache.org/jira/browse/YARN-2567
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
>
> This is the remaining part of YARN-2001 - to halt allocations after restart
> till x% of nodes sync back with the RM. This is useful for avoiding bad
> scheduling during the time the nodes are still joining back after a
> restart/failover.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)