[
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128599#comment-15128599
]
Sunil G commented on YARN-4635:
-------------------------------
Thanks [~djp]
bq.We can discuss more about purge node from global list, like: time based,
event (NM reconnect) based, etc. in a dedicated JIRA YARN-4637
+1. Yes, we can cover time based/ event based cases in that JIRA. And as you
mentioned, corner case will happen only if some AM launched on a node which is
later blacklisted due to another apps' failure.
> Add global blacklist tracking for AM container failure.
> -------------------------------------------------------
>
> Key: YARN-4635
> URL: https://issues.apache.org/jira/browse/YARN-4635
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Critical
> Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM
> container failures in global
> affection. That means we need to differentiate the non-succeed
> ContainerExitStatus reasoning from
> NM or more related to App.
> For more details, please refer the document in YARN-4576.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)