Zhijie Shen commented on YARN-3025:

bq. The above is desirable.

My suggestion is to work on it step by step. First, we can make the api 
available to get the blacklist nodes from RM. And then  we can see the 
efficient way to persist them into the state store to overcome RM restarting.

I consider these two pieces disjoint. Even if we write the store every time the 
blacklist is updated, we still cannot ensure 100% reliable as crash can happen 
after blacklist is updated and before the state store is written. It just 
matters how fault tolerant you want this feature to be.

bq. Could we write to a memory store during every heartbeat only if a new 
blacklist is published by AM.

I think currently the list is cached in the memory yet.

> Provide API for retrieving blacklisted nodes
> --------------------------------------------
>                 Key: YARN-3025
>                 URL: https://issues.apache.org/jira/browse/YARN-3025
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Ted Yu
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List<String> blacklistAdditions,
>       List<String> blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List<String> getBlacklistedNodes()
> {code}

This message was sent by Atlassian JIRA

Reply via email to