[
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203522#comment-15203522
]
Vinod Kumar Vavilapalli commented on YARN-4837:
-----------------------------------------------
bq. "DISKS_FAILED" shouldn't be skipped for the reason I mentioned in
YARN-4576. Also, we cannot simply judge system innocent when hitting memory
issues.
As [~vvasudev] pointed out [here on
YARN-4576|https://issues.apache.org/jira/browse/YARN-4576?focusedCommentId=15202664&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15202664],
the right solution is to have the RM detect bouncing nodes and then not to
allocate new containers to bouncing nodes until they stabilize.
bq. Also, hide all AM scheduling info/preference from application doesn't make
sense in long time: AM can ask for resources for its running containers in the
beginning, but application cannot ask how to place its AM even today which is
sad to me.
My earlier comment came out a little inaccurate when I said about "hiding
AM-container-scheduling inside the RM". What I really meant is that any
automatic scheduling decision coming out of system failures/events should be
hidden from end-users - just like preemption-handling! We already have
ResourceRequest as part of AM-launch-context. No reason why we cannot have more
such things. However, this is different from RM automatically ruling out nodes
as was done at YARN-2005 and related JIRAs.
bq. YARN-4685 is something fixable and much better than the age without
blacklist (we do see AM keep launching on bad nodes repeatedly and get stuck in
many cases). We just need to go ahead to fix YARN-4685.
YARN-4685 happened because of an inappropriate solution to a real problem - we
should pause going down this route till we figure out the right solution.
> User facing aspects of 'AM blacklisting' feature need fixing
> ------------------------------------------------------------
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed
> before we release it in 2.8.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)