[
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201546#comment-15201546
]
Sunil G commented on YARN-4837:
-------------------------------
Thanks [~vinodkv] for pitching in.
YARN-2005 blacklists nodes if AM container launch failed due to DISK_FAILED.
And after YARN-4284, blacklisting for am-container-failure is made for all
container failure except PREEMPTED. There were few discussion on usecase
aspects for this change.
If blacklisting (am container failure) feature is enabled in cluster level, all
applications will be forced to comply the blacklisting rule. YARN-4389 had also
an option to disable this feature from application end. Also it could control
the threshold if its too strict (and vice versa). Yes, agreeing to your point
and its early for user to take blacklisting decisions w/o having much
needed/useful information. But by seeing the current aggressive nature, this
change was helping in skipping this feature.
Agreeing that this has to be a controllable feature without causing problems in
a busy cluster. I think may be a time based purging solution can be ideal to
allow same app to use the node again.
> User facing aspects of 'AM blacklisting' feature need fixing
> ------------------------------------------------------------
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed
> before we release it in 2.8.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)