[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294461#comment-14294461
 ] 

Zhijie Shen commented on YARN-3025:
-----------------------------------

IMHO, we've mixed two things together in the prior discussion:

1. First. RM should provide the API to let AM retrieve the blacklisted nodes. 
By doing so, upon AM crashing and restarting, it can sync with RM to get the 
last state of the blacklisted nodes before AM get restarted. This is feasible 
whether RM persists the blacklisted nodes into the state store, given they're 
kept in memory in the scheduler.

2. Second, writing the blacklisted nodes into the state store is necessary only 
when we even want to make sure the blacklisted nodes is recoverable over RM 
restarting. This can be further divided into two cases: 1) If we want to make 
sure the blacklisted nodes is recoverable after ordinary RM restarting, we can 
just write the latest blacklisted nodes of running apps in the the state store 
once upon RM stopping. 2) If we want to make sure the blacklisted nodes is 
recoverable after RM crashing, we can update the latest blacklisted nodes in 
the state store upon changes as is suggested by Tsuyoshi.



> Provide API for retrieving blacklisted nodes
> --------------------------------------------
>
>                 Key: YARN-3025
>                 URL: https://issues.apache.org/jira/browse/YARN-3025
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Ted Yu
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List<String> blacklistAdditions,
>       List<String> blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List<String> getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to