[
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200490#comment-15200490
]
Vinod Kumar Vavilapalli commented on YARN-4837:
-----------------------------------------------
Here are my concerns
- First up the feature isn't 'AM blacklisting' - we are not blacklisting AMs.
The goal is for the system to not schedule AMs on faulty nodes. The right
solution is to identify why we keep launching on bad-nodes instead of marking
them unhealthy - but I can see why a blacklist threshold is useful when we
*simply don't know*.
- The configurations are all named yarn.am.blacklisting even though they
should be under a yarn.resourcemanager hierarchy
- We just blindly add a node to the app's blacklist even if we just hit *one*
AM failure. And the error / exit-code doesn't matter at all.
- Irrespective of all that, I actually don't see why we should already expose
this to end-users i.e the whole premise of YARN-4389. Why should an app
specifically care "the number of nodes YARN blacklists for my AM container
launch"?
I'm digging into the feature more for a careful look.
/cc
- [~adhoot], [~jlowe], [~kasha] who were involved with YARN-2005 for the
naming changes
- [~sunilg] / [~djp] who worked on YARN-4389.
While we discuss this, I think we should take the private feature before 2.8.0
goes out.
> User facing aspects of 'AM blacklisting' feature need fixing
> ------------------------------------------------------------
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed
> before we release it in 2.8.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)