[
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Varun Vasudev updated YARN-90:
------------------------------
Attachment: apache-yarn-90.9.patch
Uploaded a new patch to address comments by [~mingma] and [~zxu].
bq. Nit: For "Set<String> postCheckFullDirs = new HashSet<String>(fullDirs);".
It doesn't have to create postCheckFullDirs. It can directly refer to fullDirs
later.
It was just to ease lookups - instead of searching through a list, lookup a
set. If you feel strongly about it, I can change it.
{quote}
can change
if (!postCheckFullDirs.contains(dir) && postCheckOtherDirs.contains(dir)) {
to
if (postCheckOtherDirs.contains(dir)) {
{quote}
Fixed.
{quote}
change
if (!postCheckOtherDirs.contains(dir) && postCheckFullDirs.contains(dir)) {
to
if (postCheckFullDirs.contains(dir)) {
{quote}
Fixed.
{quote}
3. in verifyDirUsingMkdir:
Can we add int variable to file name to avoid loop forever(although it is a
very small chance) like the following?
long i = 0L;
while (target.exists())
\{ randomDirName = RandomStringUtils.randomAlphanumeric(5) + i++; target = new
File(dir, randomDirName); }
{quote}
Fixed.
{quote}
4. in disksTurnedBad:
Can we add break in the loop when disksFailed is true so we exit the loop
earlier?
if (!preCheckDirs.contains(dir))
\{ disksFailed = true; break; }
{quote}
Fixed.
{quote}
5. in disksTurnedGood same as item 4:
Can we add break in the loop when disksTurnedGood is true?
{quote}
Fixed.
{quote}
In function verifyDirUsingMkdir, target.exists(), target.mkdir() and
FileUtils.deleteQuietly(target) is not atomic,
What happen if another thread try to create the same directory(target)?
{quote}
verifyDirUsingMkdir is called by testDirs which is called by checkDirs() which
is synchronized.
> NodeManager should identify failed disks becoming good back again
> -----------------------------------------------------------------
>
> Key: YARN-90
> URL: https://issues.apache.org/jira/browse/YARN-90
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Ravi Gummadi
> Assignee: Varun Vasudev
> Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch,
> YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch,
> apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch,
> apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch,
> apache-yarn-90.8.patch, apache-yarn-90.9.patch
>
>
> MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes
> down, it is marked as failed forever. To reuse that disk (after it becomes
> good), NodeManager needs restart. This JIRA is to improve NodeManager to
> reuse good disks(which could be bad some time back).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)