Varun Vasudev updated YARN-90:
    Attachment: apache-yarn-90.9.patch

Uploaded a new patch to address comments by [~mingma] and [~zxu].

bq. Nit: For "Set<String> postCheckFullDirs = new HashSet<String>(fullDirs);". 
It doesn't have to create postCheckFullDirs. It can directly refer to fullDirs 

It was just to ease lookups - instead of searching through a list, lookup a 
set. If you feel strongly about it, I can change it.

can change
if (!postCheckFullDirs.contains(dir) && postCheckOtherDirs.contains(dir)) {
if (postCheckOtherDirs.contains(dir)) {


if (!postCheckOtherDirs.contains(dir) && postCheckFullDirs.contains(dir)) {
if (postCheckFullDirs.contains(dir)) {


3. in verifyDirUsingMkdir:
Can we add int variable to file name to avoid loop forever(although it is a 
very small chance) like the following?
long i = 0L;
while (target.exists())
\{ randomDirName = RandomStringUtils.randomAlphanumeric(5) + i++; target = new 
File(dir, randomDirName); }


4. in disksTurnedBad:
Can we add break in the loop when disksFailed is true so we exit the loop 
if (!preCheckDirs.contains(dir))
\{ disksFailed = true; break; }


5. in disksTurnedGood same as item 4:
Can we add break in the loop when disksTurnedGood is true?


In function verifyDirUsingMkdir, target.exists(), target.mkdir() and 
FileUtils.deleteQuietly(target) is not atomic,
What happen if another thread try to create the same directory(target)?

verifyDirUsingMkdir is called by testDirs which is called by checkDirs() which 
is synchronized.

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch, apache-yarn-90.9.patch
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).

This message was sent by Atlassian JIRA

Reply via email to