[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201546#comment-15201546
 ] 

Sunil G commented on YARN-4837:
-------------------------------

Thanks [~vinodkv] for pitching in.

YARN-2005 blacklists nodes if AM container launch failed due to DISK_FAILED. 
And after YARN-4284, blacklisting for am-container-failure is made for all 
container failure except PREEMPTED. There were few discussion on usecase 
aspects for this change.

If blacklisting (am container failure) feature is enabled in cluster level, all 
applications will be forced to comply the blacklisting rule. YARN-4389 had also 
an option to disable this feature from application end. Also it could control 
the threshold if its too strict (and vice versa). Yes, agreeing to your point 
and its early for user  to take blacklisting decisions w/o having much 
needed/useful information. But by seeing the current aggressive nature, this 
change was helping in skipping this feature.

Agreeing that this has to be a controllable feature without causing problems in 
a busy cluster. I think may be a time based purging solution can be ideal to 
allow same app to use the node again.

> User facing aspects of 'AM blacklisting' feature need fixing
> ------------------------------------------------------------
>
>                 Key: YARN-4837
>                 URL: https://issues.apache.org/jira/browse/YARN-4837
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to