[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

zhihai xu (JIRA) Wed, 08 Oct 2014 14:39:49 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164214#comment-14164214
 ]


zhihai xu commented on YARN-90:
-------------------------------

I looked at the patch:
some nits I found:
1. can change
 if (!postCheckFullDirs.contains(dir) && postCheckOtherDirs.contains(dir)) {
to 
 if (postCheckOtherDirs.contains(dir)) {
because postCheckFullDirs and postCheckOtherDirs are mutually exclusive set.

2. same to item 1
change
 if (!postCheckOtherDirs.contains(dir) && postCheckFullDirs.contains(dir)) {
to 
if (postCheckFullDirs.contains(dir)) {

3. in verifyDirUsingMkdir:
Can we add int variable to file name to avoid loop forever(although it is a 
very small chance) like the following? 
long i = 0L;
while (target.exists()) {
      randomDirName = RandomStringUtils.randomAlphanumeric(5) + i++;
      target = new File(dir, randomDirName);   
 }

4. in disksTurnedBad:
Can we add break in the loop when disksFailed is true so we exit the loop 
earlier?
     if (!preCheckDirs.contains(dir)) {
        disksFailed = true;
        break;
       }

5. in disksTurnedGood same as item 4:
Can we add break in the loop when disksTurnedGood is true?

thanks

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

Reply via email to