[
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732429#comment-14732429
]
Karthik Kambatla commented on YARN-2005:
----------------------------------------
Thanks for working on this, Anubhav. The approach looks good to me. Minor
comments/nits on the patch itself:
# Spurious changes/imports in AbstractYarnScheduler, FifoScheduler, RMAppImpl,
TestAbstractYarnScheduler, TestAMRestart, YarnScheduler.
# In AppSchedulingInfo
## synchronize *only* the common updateBlacklist method?
## Also, should we just synchronize on the list in question and not all of
AppSchedulingInfo? I am fine with leaving it as is, if that is a lot of work
and won't yield any big gains.
## The comments in the common updateBlaclist method refer to userBlacklist
while the method operates on both system and user blacklists.
## Should transferStateFromPreviousAppSchedulingInfo update the systemBlacklist
as well? Or, is the decision here to have the system blacklist per app-attempt?
# BlacklistAdditionsRemovals:
## Mark it Private
## Rename to BlacklistUpdates
## Rename members blacklistAdditions and blacklistRemovals to additions and
removals, and update the getters accordingly?
# BlackListManager
## Mark it Private
## Rename addNodeContainerFailure to addNode?
## Rename getter?
# In DisabledBlacklistManager, define a static EMPTY_LIST similar to
SimpleBlacklistManager and use that to avoid creating two ArrayLists for each
AppAttempt.
# RMAppImpl: If am blacklisting is not enabled, we don't need to read the
disable threshold.
# RMAppAttempt: Instead of getAMBlacklist, should we call it getSystemBlacklist
to be consistent with the way we refer to it in the scheduler?
# RMAppAttemptImpl
## EMPTY_SYSTEM_BLACKLIST is unused
## Update any variables based on the method name - getAMBlacklist vs
getSystemBlacklist
## Shouldn't we blacklist nodes on LaunchFailedTransition - may be in a
follow-up JIRA? Can we file one if you agree.
# The MockRM change is unrelated? I like the change, but may be we should do it
in a separate clean-up JIRA. It might have a few other things to clean-up :)
# TestAMRestart: A couple of unused variables.
# Why are the changes to TestCapacityScheduler needed? They don't look related
to this patch.
# TestRMAppLogAggregationStatus change is unrelated.
> Blacklisting support for scheduling AMs
> ---------------------------------------
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 0.23.10, 2.4.0
> Reporter: Jason Lowe
> Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch,
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch,
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch,
> YARN-2005.008.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch
> after the same node fails a configurable number of AM attempts. This would
> be similar to the blacklisting support for scheduling task attempts in the
> MapReduce AM but for scheduling AM attempts on the RM side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)