[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115534#comment-14115534 ] Jason Lowe commented on YARN-1781: -- We've run into situations where this new behavior results in disks that end up being filled by containers remain full and never recover. See YARN-2473. YARN-90 won't help much in this case because the files that filled the disk won't be deleted. Prior to this change the disks would auto-recover when the container completed, so this is a significant regression. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.4.0 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924796#comment-13924796 ] Hudson commented on YARN-1781: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #503 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/503/]) YARN-1781. Modified NodeManagers to allow admins to specify max disk utilization for local disks so as to be able to offline full disks. Contributed by Varun Vasudev. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1575463) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.4.0 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924880#comment-13924880 ] Hudson commented on YARN-1781: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1720 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1720/]) YARN-1781. Modified NodeManagers to allow admins to specify max disk utilization for local disks so as to be able to offline full disks. Contributed by Varun Vasudev. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1575463) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.4.0 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923760#comment-13923760 ] Varun Vasudev commented on YARN-1781: - {quote} Styling Some of the patch has non-standard formatting. Please fix them. For e.g., we say if (condition) { // } else { // } {quote} My apologies for this. Fixed. {quote} YarnConfiguration Create a prefix for .disk-health-checker. and use it for all Disk-check related configs. max-space-utilization-perc - max-disk-utilization-per-disk-percentage? min-free-space-mb - min-free-space-per-disk-percentage? {quote} Fixed.(I'm presuming min-free-space-per-disk-percentage meant min-free-space-per-disk-mb) {quote} DirectoryCollection checkDiskPercentageLimit() - isDiskUsageUnderPercentageLimit() and checkDiskSpaceLimit() - isDiskFreeSpaceWithinLimit() Please replace 'perc/Perc' etc with the more verbose 'percentage/Percentage' everywhere {quote} Fixed {quote} LocalDirsHandlerService maxUsedSpacePerc - maxUsableSpacePercentagePerDisk and minFreeSpaceMB - minFreeSpacePerDiskMB {quote} Fixed {quote} TestDirectoryCollection The test using testDir.getTotalSpace() can be flaky depending on what other things are running on the testing machine. Can we just mock the disk-usage here? testCutoffSetters - testDiskLimitsCutoffSetters, {quote} getTotalSpace() should be consistent no? According to the Java docs, it returns the size of the partition which should be the same every run. The idea of the test is to simulate a case where we don't allow any of the disk to be used. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923795#comment-13923795 ] Hadoop QA commented on YARN-1781: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633348/apache-yarn-1781.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3296//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3296//console This message is automatically generated. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924203#comment-13924203 ] Vinod Kumar Vavilapalli commented on YARN-1781: --- Thanks this looks better now. One other thing I was thinking about was our defaults. Even with the current defaults in your patch, NM will detect full disks and offline them from what I see. Earlier I was thinking if we should set 95%, 50MB as defaults, but given NM detects full disks even with current defaults, I am good. +1, checking this in. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924279#comment-13924279 ] Hadoop QA commented on YARN-1781: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633428/apache-yarn-1781.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3300//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3300//console This message is automatically generated. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924583#comment-13924583 ] Vinod Kumar Vavilapalli commented on YARN-1781: --- Looks good. Checking this in. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922457#comment-13922457 ] Hadoop QA commented on YARN-1781: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633107/apache-yarn-1781.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3277//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3277//console This message is automatically generated. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923563#comment-13923563 ] Sunil G commented on YARN-1781: --- I have created a separate JIRA YARN-1799 as per the comment from Vinod. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920735#comment-13920735 ] Sunil G commented on YARN-1781: --- Thank you for the reply. I have another point in addition to what I mentioned as point 2. 3). To find the percentage of free space, below code is used. float freePerc = testDir.getUsableSpace()/(float)testDir.getTotalSpace(); Percenatage will very much depend on total size avilable in different directory partitions. And if multiple directories are configured with different disk size, this value may not come in same uniform manner. So sizeRemaining in terms of few GB's or MB's may be a good measure. Like for example, minimum 1GB free is needed to assign a directory. Adding more information abount point 2). Currently the check in LocalDirAllocator is to see whether size can meet the capacity in lastAccessed directory. In a scenario where Dir1 is assigned for a task to write 200 MB and may be because of heavy usage another task also will be given with Dir1 for 100MB immediately. Now both these tasks will write 300MB. But when task 2 is given, it never checked for what was assigned to task1. Only free space at that point is chceked. Last allotted space is not considered while assiging for next task. It is not possible to consider as it is hard to predict disk write speed here. So a free space check here also can help (can avoid disk full within 2 sec healthcheck monitor) to avoid a disk full. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921138#comment-13921138 ] Hadoop QA commented on YARN-1781: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632851/apache-yarn-1781.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3265//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3265//console This message is automatically generated. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921804#comment-13921804 ] Xuan Gong commented on YARN-1781: - The patch looks fine to me. One nit: {code} +diskUtilizationPercCutoff = utilizationPercCutOff; +diskUtilizationSpaceCutoff = utilizationSpaceCutOff; +diskUtilizationPercCutoff = +utilizationPercCutOff 0.0F ? 0.0F : utilizationPercCutOff; +diskUtilizationPercCutoff = +utilizationPercCutOff 100.0F ? 100.0F : diskUtilizationPercCutoff; +diskUtilizationSpaceCutoff = +utilizationSpaceCutOff 0 ? 0 : utilizationSpaceCutOff; {code} Maybe we can simplify a little bit. {code} diskUtilizationPercCutoff = utilizationPercCutOff 0.0F ? 0.0F : utilizationPercCutOff 100.0F ? 100.0F : utilizationPercCutOff; diskUtilizationSpaceCutoff = utilizationSpaceCutOff 0 ? 0 : utilizationSpaceCutOff; {code} NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919750#comment-13919750 ] Jason Lowe commented on YARN-1781: -- Note that we may need to do more than just mark disks as unusable once they are full for a specified definition of full. I suspect a disk being full is a more transient kind of failure than other failures, and it would be nice if full disks were added back in to the list of good dirs once they fall below a threshold. Not a hard requirement necessarily for this JIRA, but I can see it being an immediate followup request if not implemented. The recovery from full may be covered by YARN-90 when that's implemented. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919766#comment-13919766 ] Varun Vasudev commented on YARN-1781: - My plan is to work YARN-90 once a patch for this gets checked in. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919868#comment-13919868 ] Hadoop QA commented on YARN-1781: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632576/apache-yarn-1781.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3249//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3249//console This message is automatically generated. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919996#comment-13919996 ] Varun Vasudev commented on YARN-1781: - The attached patch adds support for checking disk utilization as part of the checkDirs function in DirectoryCollection. Disk utilization can be specified as a percentage via the yarn config, with the default value set to 1.0F(use the full disk). NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920488#comment-13920488 ] Sunil G commented on YARN-1781: --- Few comments from my side: NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920489#comment-13920489 ] Sunil G commented on YARN-1781: --- 1. Disk Size can come down when some tasks finishes and service deletes few data from local directory. There are more chances that this Disk Full may be temporary scenario because of many tasks accessing local directories. So there can be counter checks to ensure that these directories are added back to good list of directories. Else these directory may lost. 2. As I have commented in https://issues.apache.org/jira/browse/YARN-257?focusedCommentId=13919295page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13919295 In the LocalDirAllocator, it is better to check for high percentage of disk used. And do not assign such a directory to that task. These measures might possible help to resolve the new tasks not to fail because of an immediate disk full scenario. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920503#comment-13920503 ] Varun Vasudev commented on YARN-1781: - 1. Disk Size can come down when some tasks finishes and service deletes few data from local directory. There are more chances that this Disk Full may be temporary scenario because of many tasks accessing local directories. So there can be counter checks to ensure that these directories are added back to good list of directories. Else these directory may lost. This will be taken care when I fix YARN-90 which brings back bad disks once they've recovered. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)