[ 
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128664#comment-15128664
 ] 

Junping Du commented on YARN-4635:
----------------------------------

Thanks for comments, Sunil.
bq. Is this the case where node1 to node6 is blacklisted by app and node7 to 
node10 is blacklist by global manager.
Yes. This is correct.

bq. Could we also check disableThreshold on the total Set which we created now. 
And if we crosses the limit, clear app based / global based blacklists from 
this list. Could this solve the above mentioned scenario?
The thing could be slightly complicated than this. Several things to consider:
- The threshold can be different for global/app as we already give app 
flexibility in YARN-4389, we should choose one bar (upper or lower or always 
app bar). 
- When together over threshold bar we chose above, we should flip both lists or 
only one of them. Also, the flip mechanism worth to discuss further, as I think 
other mechanism like: LRU could be better.
- if one list get flipped, how shall we merge with the other unflipped one. The 
removal items could overlap items in additions although they belongs to 
different affected scope, etc.
I would suggest to have a further discussion in a separated JIRA.

> Add global blacklist tracking for AM container failure.
> -------------------------------------------------------
>
>                 Key: YARN-4635
>                 URL: https://issues.apache.org/jira/browse/YARN-4635
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM 
> container failures in global 
> affection. That means we need to differentiate the non­-succeed 
> ContainerExitStatus reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to