[ 
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128628#comment-15128628
 ] 

Sunil G commented on YARN-4635:
-------------------------------

bq. it is possible that two lists together could unexpectedly blacklist all 
nodes. 
Hi [~djp]. Is this the case where node1 to node6 is blacklisted by app and 
node7 to node10 is blacklist by global manager (considering we have node1 to 
node10 and disableThreshold is 0.8).

Could we also check {{disableThreshold}} on the total Set which we created now. 
And if we crosses the limit, clear app based / global based blacklists from 
this list. Could this solve the above mentioned scenario?

> Add global blacklist tracking for AM container failure.
> -------------------------------------------------------
>
>                 Key: YARN-4635
>                 URL: https://issues.apache.org/jira/browse/YARN-4635
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM 
> container failures in global 
> affection. That means we need to differentiate the non­-succeed 
> ContainerExitStatus reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to