[
https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203367#comment-15203367
]
Sunil G commented on YARN-4636:
-------------------------------
As YARN improves in its blacklist/whitelist node functionality, one of the
major usecase from our end is to save the second/further AM Container launch
attempts to same failed node (if this is failed in a node due to external
environment/memory issues). This can really help us. With YARN-2005, we have a
mechanism in hand. And there were concerns on its strict behavior. Proposal
made in YARN-4837 helps in straighten things out for immediate 2.8.
I think YARN-4576 was trying to improve on current YARN-2005 and trying to
generalize it. As we are going forward, if we are planning for a global
blacklisting based various type of container exit codes, then policy can be
helpful assuming that we may have different type of apps. For this scenario, we
do not have usecases from our end. I checked with [~rohithsharma] and
[~Naganarasimha Garla] also for this. It will be good if we can
discuss/retrospect more on *global blacklisting* and its advantages/limitations
based on current available information from containers exit codes.
> Make blacklist tracking policy pluggable for more extensions.
> -------------------------------------------------------------
>
> Key: YARN-4636
> URL: https://issues.apache.org/jira/browse/YARN-4636
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Junping Du
> Assignee: Sunil G
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)