[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

Ming Ma (JIRA) Fri, 03 Oct 2014 15:59:57 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158643#comment-14158643
 ]


Ming Ma commented on YARN-90:
-----------------------------

Thanks, Varun.

The main question about UNHEALTHY state is whether this patch might make it 
more likely for a node to become unhealthy given "full disk" has been added as 
one of the conditions. Given [~jira.shegalov]'s YARN-1996 and [~sjlee0]'s 
MAPREDUCE-5817 have suggestions to mitigate the impact of UNHEALTHY nodes on 
existing containers and MR task scheduling, this might not be an issue.

Nit: For "Set<String> postCheckFullDirs = new HashSet<String>(fullDirs);". It 
doesn't have to create postCheckFullDirs. It can directly refer to fullDirs 
later.

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

Reply via email to