[
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164214#comment-14164214
]
zhihai xu commented on YARN-90:
-------------------------------
I looked at the patch:
some nits I found:
1. can change
if (!postCheckFullDirs.contains(dir) && postCheckOtherDirs.contains(dir)) {
to
if (postCheckOtherDirs.contains(dir)) {
because postCheckFullDirs and postCheckOtherDirs are mutually exclusive set.
2. same to item 1
change
if (!postCheckOtherDirs.contains(dir) && postCheckFullDirs.contains(dir)) {
to
if (postCheckFullDirs.contains(dir)) {
3. in verifyDirUsingMkdir:
Can we add int variable to file name to avoid loop forever(although it is a
very small chance) like the following?
long i = 0L;
while (target.exists()) {
randomDirName = RandomStringUtils.randomAlphanumeric(5) + i++;
target = new File(dir, randomDirName);
}
4. in disksTurnedBad:
Can we add break in the loop when disksFailed is true so we exit the loop
earlier?
if (!preCheckDirs.contains(dir)) {
disksFailed = true;
break;
}
5. in disksTurnedGood same as item 4:
Can we add break in the loop when disksTurnedGood is true?
thanks
> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Ravi Gummadi
> Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch,
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch,
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch,
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch,
> apache-yarn-90.8.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes
> down, it is marked as failed forever. To reuse that disk (after it becomes
> good), NodeManager needs restart. This JIRA is to improve NodeManager to
> reuse good disks(which could be bad some time back).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)