Hou Song created YARN-1380:
------------------------------
Summary: Enable NM to automatically reuse failed local dirs after
they are available again
Key: YARN-1380
URL: https://issues.apache.org/jira/browse/YARN-1380
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager
Reporter: Hou Song
Currently NM is able to kick bad directories out when they fail, but not able
to reuse them if they are fixed. This is inconvenient in large production
clusters.
In this jira I propose a patch that I am using in my organization.
It also adds a new metric of the number of failed directories so people have
clearer view from outside.
--
This message was sent by Atlassian JIRA
(v6.1#6144)