[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

Varun Vasudev (JIRA) Tue, 14 Oct 2014 01:36:56 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Varun Vasudev updated YARN-90:
------------------------------
    Attachment: apache-yarn-90.9.patch

Uploaded a new patch to address comments by [~mingma] and [~zxu].

bq. Nit: For "Set<String> postCheckFullDirs = new HashSet<String>(fullDirs);". 
It doesn't have to create postCheckFullDirs. It can directly refer to fullDirs 
later.

It was just to ease lookups - instead of searching through a list, lookup a 
set. If you feel strongly about it, I can change it.

{quote}
can change
if (!postCheckFullDirs.contains(dir) && postCheckOtherDirs.contains(dir)) {
to
if (postCheckOtherDirs.contains(dir)) {
{quote}

Fixed.

{quote}
change
if (!postCheckOtherDirs.contains(dir) && postCheckFullDirs.contains(dir)) {
to
if (postCheckFullDirs.contains(dir)) {
{quote}

Fixed.

{quote}
3. in verifyDirUsingMkdir:
Can we add int variable to file name to avoid loop forever(although it is a 
very small chance) like the following?
long i = 0L;
while (target.exists())
\{ randomDirName = RandomStringUtils.randomAlphanumeric(5) + i++; target = new 
File(dir, randomDirName); }
{quote}

Fixed.

{quote}
4. in disksTurnedBad:
Can we add break in the loop when disksFailed is true so we exit the loop 
earlier?
if (!preCheckDirs.contains(dir))
\{ disksFailed = true; break; }
{quote}

Fixed.

{quote}
5. in disksTurnedGood same as item 4:
Can we add break in the loop when disksTurnedGood is true?
{quote}

Fixed.

{quote}
In function verifyDirUsingMkdir, target.exists(), target.mkdir() and 
FileUtils.deleteQuietly(target) is not atomic,
What happen if another thread try to create the same directory(target)?
{quote}

verifyDirUsingMkdir is called by testDirs which is called by checkDirs() which 
is synchronized.

> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
>                 Key: YARN-90
>                 URL: https://issues.apache.org/jira/browse/YARN-90
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ravi Gummadi
>            Assignee: Varun Vasudev
>         Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, 
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, 
> apache-yarn-90.8.patch, apache-yarn-90.9.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
> down, it is marked as failed forever. To reuse that disk (after it becomes 
> good), NodeManager needs restart. This JIRA is to improve NodeManager to 
> reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

Reply via email to