[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339012#comment-14339012
 ] 

Vinod Kumar Vavilapalli commented on YARN-3025:
-----------------------------------------------

Coming in very late, apologies.

Some comments:
 - Echoing Bikas's first comment: Today the AMs are expected to maintain their 
own scheduling state. With this you are changing that - part of the scheduling 
state will be remembered but the remaining isn't. We should clearly draw a line 
somewhere, what is it?
 - [~zjshen] did a very good job of dividing the persistence concerns, but what 
is the guarantee that is given to the app writers? "I'll return the list of 
blacklisted nodes whenever I can, but shoot I died, so I can't help you much" 
is not going to cut it. If we want reliable notifications, we should build a 
protocol between AM and RM about the persistence of the blacklisted node list - 
too much of a complexity if you ask me. Why not leave it to the apps?
 - The blacklist information is per application-attempt, and scheduler will 
forget previous application-attempts today. So as I understand it, the patch 
doesn't work.

> Provide API for retrieving blacklisted nodes
> --------------------------------------------
>
>                 Key: YARN-3025
>                 URL: https://issues.apache.org/jira/browse/YARN-3025
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt
>
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List<String> blacklistAdditions,
>       List<String> blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List<String> getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to