[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732429#comment-14732429
 ] 

Karthik Kambatla commented on YARN-2005:
----------------------------------------

Thanks for working on this, Anubhav. The approach looks good to me. Minor 
comments/nits on the patch itself:
# Spurious changes/imports in AbstractYarnScheduler, FifoScheduler, RMAppImpl, 
TestAbstractYarnScheduler, TestAMRestart, YarnScheduler.
# In AppSchedulingInfo
## synchronize *only* the common updateBlacklist method? 
## Also, should we just synchronize on the list in question and not all of 
AppSchedulingInfo? I am fine with leaving it as is, if that is a lot of work 
and won't yield any big gains.
## The comments in the common updateBlaclist method refer to userBlacklist 
while the method operates on both system and user blacklists.
## Should transferStateFromPreviousAppSchedulingInfo update the systemBlacklist 
as well? Or, is the decision here to have the system blacklist per app-attempt?
# BlacklistAdditionsRemovals:
## Mark it Private
## Rename to BlacklistUpdates
## Rename members blacklistAdditions and blacklistRemovals to additions and 
removals, and update the getters accordingly? 
# BlackListManager
## Mark it Private
## Rename addNodeContainerFailure to addNode?
## Rename getter?
# In DisabledBlacklistManager, define a static EMPTY_LIST similar to 
SimpleBlacklistManager and use that to avoid creating two ArrayLists for each 
AppAttempt.
# RMAppImpl: If am blacklisting is not enabled, we don't need to read the 
disable threshold. 
# RMAppAttempt: Instead of getAMBlacklist, should we call it getSystemBlacklist 
to be consistent with the way we refer to it in the scheduler? 
# RMAppAttemptImpl
## EMPTY_SYSTEM_BLACKLIST is unused
## Update any variables based on the method name - getAMBlacklist vs 
getSystemBlacklist
## Shouldn't we blacklist nodes on LaunchFailedTransition - may be in a 
follow-up JIRA? Can we file one if you agree.
# The MockRM change is unrelated? I like the change, but may be we should do it 
in a separate clean-up JIRA. It might have a few other things to clean-up :)
# TestAMRestart: A couple of unused variables. 
# Why are the changes to TestCapacityScheduler needed? They don't look related 
to this patch.
# TestRMAppLogAggregationStatus change is unrelated. 

> Blacklisting support for scheduling AMs
> ---------------------------------------
>
>                 Key: YARN-2005
>                 URL: https://issues.apache.org/jira/browse/YARN-2005
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 0.23.10, 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to