[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2583: -- Attachment: YARN-2583.6.patch +1 for the patch. I made some cosmetic change based on it. I'll commit the new patch once the jenkins comes back Modify the LogDeletionService to support Log aggregation for LRS Key: YARN-2583 URL: https://issues.apache.org/jira/browse/YARN-2583 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch, YARN-2583.5.patch, YARN-2583.6.patch Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will check the cut-off-time, if all logs for this application is older than this cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for LRS. We expect a LRS application can keep running for a long time. Two different scenarios: 1) If we configured the rollingIntervalSeconds, the new log file will be always uploaded to HDFS. The number of log files for this application will become larger and larger. And there is no log files will be deleted. 2) If we did not configure the rollingIntervalSeconds, the log file can only be uploaded to HDFS after the application is finished. It is very possible that the logs are uploaded after the cut-off-time. It will cause problem because at that time the app-log-dir for this application in HDFS has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2501) [YARN-796] Changes in AMRMClient to support labels
[ https://issues.apache.org/jira/browse/YARN-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166453#comment-14166453 ] Wangda Tan commented on YARN-2501: -- Test can pass locally. [YARN-796] Changes in AMRMClient to support labels -- Key: YARN-2501 URL: https://issues.apache.org/jira/browse/YARN-2501 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2501-20141009.1.patch, YARN-2501-20141009.2.patch, YARN-2501.patch Changes in AMRMClient to support labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166472#comment-14166472 ] Hadoop QA commented on YARN-2583: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674117/YARN-2583.6.patch against trunk revision d3afd73. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5359//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5359//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5359//console This message is automatically generated. Modify the LogDeletionService to support Log aggregation for LRS Key: YARN-2583 URL: https://issues.apache.org/jira/browse/YARN-2583 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch, YARN-2583.5.patch, YARN-2583.6.patch Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will check the cut-off-time, if all logs for this application is older than this cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for LRS. We expect a LRS application can keep running for a long time. Two different scenarios: 1) If we configured the rollingIntervalSeconds, the new log file will be always uploaded to HDFS. The number of log files for this application will become larger and larger. And there is no log files will be deleted. 2) If we did not configure the rollingIntervalSeconds, the log file can only be uploaded to HDFS after the application is finished. It is very possible that the logs are uploaded after the cut-off-time. It will cause problem because at that time the app-log-dir for this application in HDFS has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2656) RM web services authentication filter should add support for proxy user
[ https://issues.apache.org/jira/browse/YARN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166479#comment-14166479 ] Hadoop QA commented on YARN-2656: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674111/YARN-2656.3.patch against trunk revision 684170d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5358//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5358//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5358//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5358//console This message is automatically generated. RM web services authentication filter should add support for proxy user --- Key: YARN-2656 URL: https://issues.apache.org/jira/browse/YARN-2656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2656.3.patch, apache-yarn-2656.0.patch, apache-yarn-2656.1.patch, apache-yarn-2656.2.patch The DelegationTokenAuthenticationFilter adds support for doAs functionality. The RMAuthenticationFilter should expose this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166510#comment-14166510 ] Hudson commented on YARN-2583: -- FAILURE: Integrated in Hadoop-trunk-Commit #6236 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6236/]) YARN-2583. Modified AggregatedLogDeletionService to be able to delete rolling aggregated logs. Contributed by Xuan Gong. (zjshen: rev cb81bac0029fce3a9726df3523f0b692cd3375b8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogDeletionService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java Modify the LogDeletionService to support Log aggregation for LRS Key: YARN-2583 URL: https://issues.apache.org/jira/browse/YARN-2583 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch, YARN-2583.5.patch, YARN-2583.6.patch Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will check the cut-off-time, if all logs for this application is older than this cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for LRS. We expect a LRS application can keep running for a long time. Two different scenarios: 1) If we configured the rollingIntervalSeconds, the new log file will be always uploaded to HDFS. The number of log files for this application will become larger and larger. And there is no log files will be deleted. 2) If we did not configure the rollingIntervalSeconds, the log file can only be uploaded to HDFS after the application is finished. It is very possible that the logs are uploaded after the cut-off-time. It will cause problem because at that time the app-log-dir for this application in HDFS has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2312: - Attachment: YARN-2312.7.patch Thanks Jian, you're right. Updating to use Long.parseLong instead of Integer.parseInt in YarnChild.java. Marking ContainerId#getId as deprecated --- Key: YARN-2312 URL: https://issues.apache.org/jira/browse/YARN-2312 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch, YARN-2312.7.patch {{ContainerId#getId}} will only return partial value of containerId, only sequence number of container id without epoch, after YARN-2229. We should mark {{ContainerId#getId}} as deprecated and use {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166558#comment-14166558 ] Leitao Guo commented on YARN-1582: -- Any updates for this jira? Why not add yarn.scheduler.maximum-allocation-vcores to each queue? Capacity Scheduler: add a maximum-allocation-mb setting per queue -- Key: YARN-1582 URL: https://issues.apache.org/jira/browse/YARN-1582 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1582-branch-0.23.patch We want to allow certain queues to use larger container sizes while limiting other queues to smaller container sizes. Setting it per queue will help prevent abuse, help limit the impact of reservations, and allow changes in the maximum container size to be rolled out more easily. One reason this is needed is more application types are becoming available on yarn and certain applications require more memory to run efficiently. While we want to allow for that we don't want other applications to abuse that and start requesting bigger containers then what they really need. Note that we could have this based on application type, but that might not be totally accurate either since for example you might want to allow certain users on MapReduce to use larger containers, while limiting other users of MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166641#comment-14166641 ] Hadoop QA commented on YARN-2312: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674129/YARN-2312.7.patch against trunk revision cb81bac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.pipes.TestPipeApplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5360//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5360//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5360//console This message is automatically generated. Marking ContainerId#getId as deprecated --- Key: YARN-2312 URL: https://issues.apache.org/jira/browse/YARN-2312 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch, YARN-2312.7.patch {{ContainerId#getId}} will only return partial value of containerId, only sequence number of container id without epoch, after YARN-2229. We should mark {{ContainerId#getId}} as deprecated and use {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2671) ApplicationSubmissionContext change breaks the existing app submission
[ https://issues.apache.org/jira/browse/YARN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166680#comment-14166680 ] Hudson commented on YARN-2671: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2671. Fix the Jira number in the change log. (zjshen: rev 5b12df6587eb4f37d09c9ffc35a0ea59694df831) * hadoop-yarn-project/CHANGES.txt ApplicationSubmissionContext change breaks the existing app submission -- Key: YARN-2671 URL: https://issues.apache.org/jira/browse/YARN-2671 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Wangda Tan Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2617-20141009.1.patch After YARN-2493, app submission goes wrong with the following exception: {code} 2014-10-09 15:50:35,774 WARN [297524352@qtp-1314143300-2 - /ws/v1/cluster/apps] webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:194) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateResourceRequest(RMAppManager.java:390) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:346) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:570) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$2.run(RMWebServices.java:896) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$2.run(RMWebServices.java:1) {code} This is because resource is putting into ResourceRequest of ApplicationSubmissionContext, but not directly into ApplicationSubmissionContext, therefore the sanity check won't get resource object from context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2662) TestCgroupsLCEResourcesHandler leaks file descriptors.
[ https://issues.apache.org/jira/browse/YARN-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166670#comment-14166670 ] Hudson commented on YARN-2662: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2662. TestCgroupsLCEResourcesHandler leaks file descriptors. Contributed by Chris Nauroth. (cnauroth: rev d3afd730acfa380ab5032be5ee296c5d73744518) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java TestCgroupsLCEResourcesHandler leaks file descriptors. -- Key: YARN-2662 URL: https://issues.apache.org/jira/browse/YARN-2662 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 2.6.0 Attachments: YARN-2662.1.patch {{TestCgroupsLCEResourcesHandler}} includes tests that write and read values from the various cgroups files. After the tests read from a file, they do not close it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166672#comment-14166672 ] Hudson commented on YARN-2180: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2180. [YARN-1492] In-memory backing store for cache manager. (Chris Trezzo via kasha) (kasha: rev 4f426fe2232ed90d8fdf8619fbdeae28d788b5c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResourceReference.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java In-memory backing store for cache manager - Key: YARN-2180 URL: https://issues.apache.org/jira/browse/YARN-2180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1416#comment-1416 ] Hudson commented on YARN-1492: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2180. [YARN-1492] In-memory backing store for cache manager. (Chris Trezzo via kasha) (kasha: rev 4f426fe2232ed90d8fdf8619fbdeae28d788b5c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResourceReference.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)
[ https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166674#comment-14166674 ] Hudson commented on YARN-2544: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2544. Added admin-API objects for using node-labels. Contributed by Wangda Tan. (vinodkv: rev 596702a02501e9cb09aabced168027189eaf02ba) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoveFromClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/AddToClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetNodesToLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/QueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RemoveFromClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/AddToClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetClusterNodeLabelsRequest.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/ReplaceLabelsOnNodeRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetNodesToLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/ReplaceLabelsOnNodeResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoveFromClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RemoveFromClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetClusterNodeLabelsRequestPBImpl.java [YARN-796] Common server side PB changes (not include user API PB changes) -- Key: YARN-2544 URL: https://issues.apache.org/jira/browse/YARN-2544 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2544-20141008.1.patch, YARN-2544-20141008.2.patch, YARN-2544-20141009.1.patch, YARN-2544.patch, YARN-2544.patch, YARN-2544.patch
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166681#comment-14166681 ] Hudson commented on YARN-2617: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2617. Fixed ApplicationSubmissionContext to still set resource for backward compatibility. Contributed by Wangda Tan. (zjshen: rev e532ed8faa8db4b008a5b8d3f82b48a1b314fa6c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java NM does not need to send finished container whose APP is not running to RM -- Key: YARN-2617 URL: https://issues.apache.org/jira/browse/YARN-2617 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.6.0 Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, YARN-2617.patch We([~chenchun]) are testing RM work preserving restart and found the following logs when we ran a simple MapReduce task PI. NM continuously reported completed containers whose Application had already finished while AM had finished. {code} 2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {code} In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean up already completed applications. But it will only remove appId from 'app.context.getApplications()' when ApplicaitonImpl received evnet 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might receive this event for a long time or could not receive. * For NonAggregatingLogHandler, it wait for YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, then it will be scheduled to delete Application logs and send the event. * For LogAggregationService, it might fail(e.g. if user does not have HDFS write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166684#comment-14166684 ] Hudson commented on YARN-2583: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2583. Modified AggregatedLogDeletionService to be able to delete rolling aggregated logs. Contributed by Xuan Gong. (zjshen: rev cb81bac0029fce3a9726df3523f0b692cd3375b8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java Modify the LogDeletionService to support Log aggregation for LRS Key: YARN-2583 URL: https://issues.apache.org/jira/browse/YARN-2583 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch, YARN-2583.5.patch, YARN-2583.6.patch Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will check the cut-off-time, if all logs for this application is older than this cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for LRS. We expect a LRS application can keep running for a long time. Two different scenarios: 1) If we configured the rollingIntervalSeconds, the new log file will be always uploaded to HDFS. The number of log files for this application will become larger and larger. And there is no log files will be deleted. 2) If we did not configure the rollingIntervalSeconds, the log file can only be uploaded to HDFS after the application is finished. It is very possible that the logs are uploaded after the cut-off-time. It will cause problem because at that time the app-log-dir for this application in HDFS has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2493) [YARN-796] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166671#comment-14166671 ] Hudson commented on YARN-2493: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2493. Added user-APIs for using node-labels. Contributed by Wangda Tan. (vinodkv: rev 180afa2f86f9854c536c3d4ff7476880e41ac48d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/CHANGES.txt [YARN-796] API changes for users Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166679#comment-14166679 ] Hudson commented on YARN-2629: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #707 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/707/]) YARN-2629. Made the distributed shell use the domain-based timeline ACLs. Contributed by Zhijie Shen. (zjshen: rev 1d4612f5ad9678c952b416e798dccd20c88f96ef) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/DSConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java Make distributed shell use the domain-based timeline ACLs - Key: YARN-2629 URL: https://issues.apache.org/jira/browse/YARN-2629 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, YARN-2629.4.patch For demonstration the usage of this feature (YARN-2102), it's good to make the distributed shell create the domain, and post its timeline entities into this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.11.patch 11.patch is diff from trunk and implements the code review feedback. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, YARN-2198.11.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166718#comment-14166718 ] Remus Rusanu commented on YARN-2198: libwinutils.c CreateLogonForUser - confusing name, makes me think a new- CreateLogonTokenForUser? RR: Fixed TestWinUtils - can we add testing specific to security? RR: Tracked by YARN-2636 ContainerLaunch launchContainer - nit, why userName here, it's user everywhere else RR fixed getLocalWrapperScriptBuilder - why not an override instead of conditional (see below wrt WindowsContainerExecutor) WindowsSecureContainerExecutor - I really think there should be a WindowsContainerExecutor RR: I left both as is, predates the WSCE it looks like this is only a 64 bit build now, where it used to be 64 and 32. I assume this is intentional and ok? RR: correct. x86 was not possible to build from mvn, and was not required. It would be really nice if we could start to separate out some of this new functionality from winutils, e.g., make the elevated service functionality independent. I see that there is a jira for doing so down the road, which is good... it looks like the elevated privilages are just around creating local directories and (obviously) spawning the process. If a stand-alone service just created and set permissions on those directories, and the java code simply checked for their existance and then moved on if they were present, I think that a lot of the back-and-forth of the elevation could be dropped to just one call to create the base directory and a second to spawn/hand back the output handles. Is that correct? RR: I actually intentionally avoided that. The LCE does it, and the result is a lot of duplication between Java code in Default Container Executor and C code in the native container-executor. With the WSCE I preferred to keep the logic in Java and use native methods just for primitive operations. service.c // We're now transfering ownership of the duplicated handles to the caller + // If the RPC call fails after this point the handles are leaked inside the NM process this is a little alarming. Doesn't the close() call clean this up, regardless of success/ fail? RR: I added some more comments to clarify that only a process kill or hardware error can fail after this point. An atomic transfer is not possible. why is this conditional check different from all the others? RR: fixed nit anonimous sp anonymous RR: fixed just a line added, pls revert RR: fixed ElevatedFileSystem:delete() it appears that the tests for existance, etc, are run in a non-elevated way, while the actions are elevated. Is it possible for permissions to be such that the non-elevated tests do not see files/directories which are present for permission reasons? should those not be elevated also? RR: It is not possible under a correct configured deployment. Explicit overwriting permissions can deny this, but that will always be possible (eg. deny permission explicitly to LocalSystem). streamReaderThread.run - using the readLine() instead of following the simple buffer copy idiom in ShellCommandExecutor has some efficiency issues, granted it looks to be reading memory sized data so it may be no big deal, but it would be nice to follow the buffer-copy pattern instead RR: I forgot to address this. Todo. ContainerExecutor comment on comment: RR: Fixed ContainerLaunch public void sanitizeEnv(...) RR: This predates WSCE, I left it as is ContainerLocalizer LOG.info(String.format(nRet: %d, nRet)); - not sure this should be info level RR: todo, forgot to address it getContainerClasspathJarPrivateDir not used in ContainerExecutor.java, we can remove that. RR: fixed Unnecessary format change only in YarnConfiguration.we can revert RR: fixed Multiple places exceed 80 column limit code convention. RR: I think I fixed all new Java code DefaultContainerExecutor#buildCommandExecutor RR: Fixed Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, YARN-2198.11.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch YARN-1972 introduces a Secure Windows
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.12.patch .12.patch is from trunk and addresses the two TODO review items Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166751#comment-14166751 ] Remus Rusanu commented on YARN-2198: .12.patch: streamReaderThread.run - using the readLine() instead of following the simple buffer copy idiom in ShellCommandExecutor has some efficiency issues, granted it looks to be reading memory sized data so it may be no big deal, but it would be nice to follow the buffer-copy pattern instead RR: Fixed ContainerLocalizer LOG.info(String.format(nRet: %d, nRet)); - not sure this should be info level RR: Fixed Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166823#comment-14166823 ] Hadoop QA commented on YARN-2198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674161/YARN-2198.11.patch against trunk revision cb81bac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5361//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5361//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5361//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5361//console This message is automatically generated. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166854#comment-14166854 ] Hadoop QA commented on YARN-2198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674163/YARN-2198.12.patch against trunk revision cb81bac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5362//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5362//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5362//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5362//console This message is automatically generated. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.10.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166871#comment-14166871 ] Hudson commented on YARN-2629: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2629. Made the distributed shell use the domain-based timeline ACLs. Contributed by Zhijie Shen. (zjshen: rev 1d4612f5ad9678c952b416e798dccd20c88f96ef) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/DSConstants.java Make distributed shell use the domain-based timeline ACLs - Key: YARN-2629 URL: https://issues.apache.org/jira/browse/YARN-2629 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, YARN-2629.4.patch For demonstration the usage of this feature (YARN-2102), it's good to make the distributed shell create the domain, and post its timeline entities into this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2671) ApplicationSubmissionContext change breaks the existing app submission
[ https://issues.apache.org/jira/browse/YARN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166872#comment-14166872 ] Hudson commented on YARN-2671: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2671. Fix the Jira number in the change log. (zjshen: rev 5b12df6587eb4f37d09c9ffc35a0ea59694df831) * hadoop-yarn-project/CHANGES.txt ApplicationSubmissionContext change breaks the existing app submission -- Key: YARN-2671 URL: https://issues.apache.org/jira/browse/YARN-2671 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Wangda Tan Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2617-20141009.1.patch After YARN-2493, app submission goes wrong with the following exception: {code} 2014-10-09 15:50:35,774 WARN [297524352@qtp-1314143300-2 - /ws/v1/cluster/apps] webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:194) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateResourceRequest(RMAppManager.java:390) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:346) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:570) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$2.run(RMWebServices.java:896) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$2.run(RMWebServices.java:1) {code} This is because resource is putting into ResourceRequest of ApplicationSubmissionContext, but not directly into ApplicationSubmissionContext, therefore the sanity check won't get resource object from context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)
[ https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166866#comment-14166866 ] Hudson commented on YARN-2544: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2544. Added admin-API objects for using node-labels. Contributed by Wangda Tan. (vinodkv: rev 596702a02501e9cb09aabced168027189eaf02ba) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoveFromClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetNodesToLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RemoveFromClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/ReplaceLabelsOnNodeRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoveFromClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/AddToClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetNodesToLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/QueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/ReplaceLabelsOnNodeResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RemoveFromClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/AddToClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetClusterNodeLabelsRequestPBImpl.java [YARN-796] Common server side PB changes (not include user API PB changes) -- Key: YARN-2544 URL: https://issues.apache.org/jira/browse/YARN-2544 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2544-20141008.1.patch, YARN-2544-20141008.2.patch, YARN-2544-20141009.1.patch, YARN-2544.patch, YARN-2544.patch, YARN-2544.patch
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166873#comment-14166873 ] Hudson commented on YARN-2617: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2617. Fixed ApplicationSubmissionContext to still set resource for backward compatibility. Contributed by Wangda Tan. (zjshen: rev e532ed8faa8db4b008a5b8d3f82b48a1b314fa6c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java NM does not need to send finished container whose APP is not running to RM -- Key: YARN-2617 URL: https://issues.apache.org/jira/browse/YARN-2617 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.6.0 Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, YARN-2617.patch We([~chenchun]) are testing RM work preserving restart and found the following logs when we ran a simple MapReduce task PI. NM continuously reported completed containers whose Application had already finished while AM had finished. {code} 2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {code} In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean up already completed applications. But it will only remove appId from 'app.context.getApplications()' when ApplicaitonImpl received evnet 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might receive this event for a long time or could not receive. * For NonAggregatingLogHandler, it wait for YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, then it will be scheduled to delete Application logs and send the event. * For LogAggregationService, it might fail(e.g. if user does not have HDFS write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2662) TestCgroupsLCEResourcesHandler leaks file descriptors.
[ https://issues.apache.org/jira/browse/YARN-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166863#comment-14166863 ] Hudson commented on YARN-2662: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2662. TestCgroupsLCEResourcesHandler leaks file descriptors. Contributed by Chris Nauroth. (cnauroth: rev d3afd730acfa380ab5032be5ee296c5d73744518) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/CHANGES.txt TestCgroupsLCEResourcesHandler leaks file descriptors. -- Key: YARN-2662 URL: https://issues.apache.org/jira/browse/YARN-2662 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 2.6.0 Attachments: YARN-2662.1.patch {{TestCgroupsLCEResourcesHandler}} includes tests that write and read values from the various cgroups files. After the tests read from a file, they do not close it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166876#comment-14166876 ] Hudson commented on YARN-2583: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2583. Modified AggregatedLogDeletionService to be able to delete rolling aggregated logs. Contributed by Xuan Gong. (zjshen: rev cb81bac0029fce3a9726df3523f0b692cd3375b8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java Modify the LogDeletionService to support Log aggregation for LRS Key: YARN-2583 URL: https://issues.apache.org/jira/browse/YARN-2583 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch, YARN-2583.5.patch, YARN-2583.6.patch Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will check the cut-off-time, if all logs for this application is older than this cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for LRS. We expect a LRS application can keep running for a long time. Two different scenarios: 1) If we configured the rollingIntervalSeconds, the new log file will be always uploaded to HDFS. The number of log files for this application will become larger and larger. And there is no log files will be deleted. 2) If we did not configure the rollingIntervalSeconds, the log file can only be uploaded to HDFS after the application is finished. It is very possible that the logs are uploaded after the cut-off-time. It will cause problem because at that time the app-log-dir for this application in HDFS has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166859#comment-14166859 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2180. [YARN-1492] In-memory backing store for cache manager. (Chris Trezzo via kasha) (kasha: rev 4f426fe2232ed90d8fdf8619fbdeae28d788b5c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResourceReference.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166865#comment-14166865 ] Hudson commented on YARN-2180: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/]) YARN-2180. [YARN-1492] In-memory backing store for cache manager. (Chris Trezzo via kasha) (kasha: rev 4f426fe2232ed90d8fdf8619fbdeae28d788b5c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResourceReference.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml In-memory backing store for cache manager - Key: YARN-2180 URL: https://issues.apache.org/jira/browse/YARN-2180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166897#comment-14166897 ] Eric Payne commented on YARN-2674: -- [~chenchun], I have noticed this behavior as well. Thank you for posting this patch. +1 (non binding) Distributed shell AM may re-launch containers if RM work preserving restart happens --- Key: YARN-2674 URL: https://issues.apache.org/jira/browse/YARN-2674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chun Chen Attachments: YARN-2674.1.patch Currently, if RM work preserving restart happens while distributed shell is running, distribute shell AM may re-launch all the containers, including new/running/complete. We must make sure it won't re-launch the running/complete containers. We need to remove allocated containers from AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2636) Windows Secure Container Executor: add unit tests for WSCE
[ https://issues.apache.org/jira/browse/YARN-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2636: --- Issue Type: Test (was: Sub-task) Parent: (was: YARN-2198) Windows Secure Container Executor: add unit tests for WSCE -- Key: YARN-2636 URL: https://issues.apache.org/jira/browse/YARN-2636 Project: Hadoop YARN Issue Type: Test Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical As title says. The WSCE has no check-in unit tests. Much of the functionality depends on elevated hadoopwinutilsvc service and cannot be tested, but lets test what is possible to be mocked in Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2636) Windows Secure Container Executor: add unit tests for WSCE
[ https://issues.apache.org/jira/browse/YARN-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2636: --- Attachment: YARN-2636.doesnotwork.0.patch Windows Secure Container Executor: add unit tests for WSCE -- Key: YARN-2636 URL: https://issues.apache.org/jira/browse/YARN-2636 Project: Hadoop YARN Issue Type: Test Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical Attachments: YARN-2636.doesnotwork.0.patch As title says. The WSCE has no check-in unit tests. Much of the functionality depends on elevated hadoopwinutilsvc service and cannot be tested, but lets test what is possible to be mocked in Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2656) RM web services authentication filter should add support for proxy user
[ https://issues.apache.org/jira/browse/YARN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166943#comment-14166943 ] Varun Vasudev commented on YARN-2656: - Thanks for the patch [~zjshen]. {quote} 2. Change proxyuser prefix to yarn.resourcemanager. The other DelegationTokenAuthenticationFilter use cases take the specific prefix as well instead of the common one hadoop. The benefit should be different use cases can have their individual proxyuser setting. {quote} I don't think we need to introduce a new variable to configure proxyusers. I think it's reasonable for now to assume that proxyusers are setup for RPC and REST. I think we should use hadoop.proxyusers instead of introducing yarn.resourcemanager.proxyusers. Apart from that the patch looks good to me. Thanks again for all the help! RM web services authentication filter should add support for proxy user --- Key: YARN-2656 URL: https://issues.apache.org/jira/browse/YARN-2656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2656.3.patch, apache-yarn-2656.0.patch, apache-yarn-2656.1.patch, apache-yarn-2656.2.patch The DelegationTokenAuthenticationFilter adds support for doAs functionality. The RMAuthenticationFilter should expose this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2662) TestCgroupsLCEResourcesHandler leaks file descriptors.
[ https://issues.apache.org/jira/browse/YARN-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166952#comment-14166952 ] Hudson commented on YARN-2662: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2662. TestCgroupsLCEResourcesHandler leaks file descriptors. Contributed by Chris Nauroth. (cnauroth: rev d3afd730acfa380ab5032be5ee296c5d73744518) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/CHANGES.txt TestCgroupsLCEResourcesHandler leaks file descriptors. -- Key: YARN-2662 URL: https://issues.apache.org/jira/browse/YARN-2662 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 2.6.0 Attachments: YARN-2662.1.patch {{TestCgroupsLCEResourcesHandler}} includes tests that write and read values from the various cgroups files. After the tests read from a file, they do not close it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166948#comment-14166948 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2180. [YARN-1492] In-memory backing store for cache manager. (Chris Trezzo via kasha) (kasha: rev 4f426fe2232ed90d8fdf8619fbdeae28d788b5c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResourceReference.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2493) [YARN-796] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166953#comment-14166953 ] Hudson commented on YARN-2493: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2493. Added user-APIs for using node-labels. Contributed by Wangda Tan. (vinodkv: rev 180afa2f86f9854c536c3d4ff7476880e41ac48d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java [YARN-796] API changes for users Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166954#comment-14166954 ] Hudson commented on YARN-2180: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2180. [YARN-1492] In-memory backing store for cache manager. (Chris Trezzo via kasha) (kasha: rev 4f426fe2232ed90d8fdf8619fbdeae28d788b5c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SharedCacheResourceReference.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java In-memory backing store for cache manager - Key: YARN-2180 URL: https://issues.apache.org/jira/browse/YARN-2180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166965#comment-14166965 ] Hudson commented on YARN-2583: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2583. Modified AggregatedLogDeletionService to be able to delete rolling aggregated logs. Contributed by Xuan Gong. (zjshen: rev cb81bac0029fce3a9726df3523f0b692cd3375b8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogDeletionService.java Modify the LogDeletionService to support Log aggregation for LRS Key: YARN-2583 URL: https://issues.apache.org/jira/browse/YARN-2583 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch, YARN-2583.5.patch, YARN-2583.6.patch Currently, AggregatedLogDeletionService will delete old logs from HDFS. It will check the cut-off-time, if all logs for this application is older than this cut-off-time. The app-log-dir from HDFS will be deleted. This will not work for LRS. We expect a LRS application can keep running for a long time. Two different scenarios: 1) If we configured the rollingIntervalSeconds, the new log file will be always uploaded to HDFS. The number of log files for this application will become larger and larger. And there is no log files will be deleted. 2) If we did not configure the rollingIntervalSeconds, the log file can only be uploaded to HDFS after the application is finished. It is very possible that the logs are uploaded after the cut-off-time. It will cause problem because at that time the app-log-dir for this application in HDFS has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2671) ApplicationSubmissionContext change breaks the existing app submission
[ https://issues.apache.org/jira/browse/YARN-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166961#comment-14166961 ] Hudson commented on YARN-2671: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2671. Fix the Jira number in the change log. (zjshen: rev 5b12df6587eb4f37d09c9ffc35a0ea59694df831) * hadoop-yarn-project/CHANGES.txt ApplicationSubmissionContext change breaks the existing app submission -- Key: YARN-2671 URL: https://issues.apache.org/jira/browse/YARN-2671 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Wangda Tan Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2617-20141009.1.patch After YARN-2493, app submission goes wrong with the following exception: {code} 2014-10-09 15:50:35,774 WARN [297524352@qtp-1314143300-2 - /ws/v1/cluster/apps] webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:194) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateResourceRequest(RMAppManager.java:390) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:346) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:570) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$2.run(RMWebServices.java:896) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$2.run(RMWebServices.java:1) {code} This is because resource is putting into ResourceRequest of ApplicationSubmissionContext, but not directly into ApplicationSubmissionContext, therefore the sanity check won't get resource object from context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166962#comment-14166962 ] Hudson commented on YARN-2617: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2617. Fixed ApplicationSubmissionContext to still set resource for backward compatibility. Contributed by Wangda Tan. (zjshen: rev e532ed8faa8db4b008a5b8d3f82b48a1b314fa6c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * hadoop-yarn-project/CHANGES.txt NM does not need to send finished container whose APP is not running to RM -- Key: YARN-2617 URL: https://issues.apache.org/jira/browse/YARN-2617 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.6.0 Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, YARN-2617.patch We([~chenchun]) are testing RM work preserving restart and found the following logs when we ran a simple MapReduce task PI. NM continuously reported completed containers whose Application had already finished while AM had finished. {code} 2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... 2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {code} In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean up already completed applications. But it will only remove appId from 'app.context.getApplications()' when ApplicaitonImpl received evnet 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might receive this event for a long time or could not receive. * For NonAggregatingLogHandler, it wait for YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, then it will be scheduled to delete Application logs and send the event. * For LogAggregationService, it might fail(e.g. if user does not have HDFS write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166960#comment-14166960 ] Hudson commented on YARN-2629: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2629. Made the distributed shell use the domain-based timeline ACLs. Contributed by Zhijie Shen. (zjshen: rev 1d4612f5ad9678c952b416e798dccd20c88f96ef) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/DSConstants.java Make distributed shell use the domain-based timeline ACLs - Key: YARN-2629 URL: https://issues.apache.org/jira/browse/YARN-2629 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, YARN-2629.4.patch For demonstration the usage of this feature (YARN-2102), it's good to make the distributed shell create the domain, and post its timeline entities into this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)
[ https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166955#comment-14166955 ] Hudson commented on YARN-2544: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1922/]) YARN-2544. Added admin-API objects for using node-labels. Contributed by Wangda Tan. (vinodkv: rev 596702a02501e9cb09aabced168027189eaf02ba) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/ReplaceLabelsOnNodeRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RemoveFromClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/AddToClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoveFromClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RemoveFromClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/AddToClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/QueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetNodesToLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/GetClusterNodeLabelsRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/ReplaceLabelsOnNodeResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetClusterNodeLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetNodesToLabelsRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoveFromClusterNodeLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java [YARN-796] Common server side PB changes (not include user API PB changes) -- Key: YARN-2544 URL: https://issues.apache.org/jira/browse/YARN-2544 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2544-20141008.1.patch, YARN-2544-20141008.2.patch, YARN-2544-20141009.1.patch, YARN-2544.patch, YARN-2544.patch,
[jira] [Updated] (YARN-2636) Windows Secure Container Executor: add unit tests for WSCE
[ https://issues.apache.org/jira/browse/YARN-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2636: --- Attachment: (was: YARN-2636.doesnotwork.0.patch) Windows Secure Container Executor: add unit tests for WSCE -- Key: YARN-2636 URL: https://issues.apache.org/jira/browse/YARN-2636 Project: Hadoop YARN Issue Type: Test Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical As title says. The WSCE has no check-in unit tests. Much of the functionality depends on elevated hadoopwinutilsvc service and cannot be tested, but lets test what is possible to be mocked in Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167039#comment-14167039 ] Sangjin Lee commented on YARN-2180: --- Thanks [~kasha]! In-memory backing store for cache manager - Key: YARN-2180 URL: https://issues.apache.org/jira/browse/YARN-2180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 2.7.0 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167093#comment-14167093 ] Hadoop QA commented on YARN-2183: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673746/YARN-2183-trunk-v4.patch against trunk revision cb81bac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5363//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5363//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5363//console This message is automatically generated. Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167105#comment-14167105 ] Sangjin Lee commented on YARN-2183: --- The release audit warning is unrelated. It is about hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/resources/.keep, which is a file that already exists on trunk, and not added by this patch. Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2656) RM web services authentication filter should add support for proxy user
[ https://issues.apache.org/jira/browse/YARN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167116#comment-14167116 ] Varun Vasudev commented on YARN-2656: - {quote} One more thing: RMAuthenticationHandler seems to be useless, we may want to discard it. Let's do it separately? {quote} Forgot to address this - lets do it as part of a separate ticket. RM web services authentication filter should add support for proxy user --- Key: YARN-2656 URL: https://issues.apache.org/jira/browse/YARN-2656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2656.3.patch, apache-yarn-2656.0.patch, apache-yarn-2656.1.patch, apache-yarn-2656.2.patch The DelegationTokenAuthenticationFilter adds support for doAs functionality. The RMAuthenticationFilter should expose this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2651) Spin off the LogRollingInterval from LogAggregationContext
[ https://issues.apache.org/jira/browse/YARN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2651: Attachment: YARN-2651.1.1.patch rebase the patch based on the latest trunk Spin off the LogRollingInterval from LogAggregationContext -- Key: YARN-2651 URL: https://issues.apache.org/jira/browse/YARN-2651 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2651.1.1.patch, YARN-2651.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2656) RM web services authentication filter should add support for proxy user
[ https://issues.apache.org/jira/browse/YARN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167130#comment-14167130 ] Zhijie Shen commented on YARN-2656: --- bq. I think we should use hadoop.proxyusers instead of introducing yarn.resourcemanager.proxyusers. KMS and HTTPFS are using this feature as well, but using their individual prefix before proxyusers. I think it makes sense because each component can edit its own proxyusers. In particular, it's desired in single-node cluster. Thinking about the property name again, should we use yarn.resourcemanager.webapp to be consistent with the existing YARN config names? RM web services authentication filter should add support for proxy user --- Key: YARN-2656 URL: https://issues.apache.org/jira/browse/YARN-2656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2656.3.patch, apache-yarn-2656.0.patch, apache-yarn-2656.1.patch, apache-yarn-2656.2.patch The DelegationTokenAuthenticationFilter adds support for doAs functionality. The RMAuthenticationFilter should expose this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2656) RM web services authentication filter should add support for proxy user
[ https://issues.apache.org/jira/browse/YARN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167137#comment-14167137 ] Varun Vasudev commented on YARN-2656: - Thanks for pointing out the KMS and HTTPFS examples. I think the yarn.resourcemanager.webapp.proxyuser prefix probably makes most sense. RM web services authentication filter should add support for proxy user --- Key: YARN-2656 URL: https://issues.apache.org/jira/browse/YARN-2656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2656.3.patch, apache-yarn-2656.0.patch, apache-yarn-2656.1.patch, apache-yarn-2656.2.patch The DelegationTokenAuthenticationFilter adds support for doAs functionality. The RMAuthenticationFilter should expose this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
zhihai xu created YARN-2675: --- Summary: the containersKilled metrics is not updated when the container is killed during localization. Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2675: Attachment: YARN-2675.000.patch the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167168#comment-14167168 ] zhihai xu commented on YARN-2675: - I uploaded a patch YARN-2675.000.patch for review. the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167175#comment-14167175 ] Karthik Kambatla commented on YARN-2183: Thanks for updating the patch, Chris. Review comments on the current patch: # YarnConfiguration - let us include units for time in the config names; period-mins, initial-delay-mins, resource-sleep-ms. # CleanerService ## Don't need a SHUTDOWN_HOOK_PRIORITY specific to CleanerService, it is not being used anywhere either. ## Nit: In the constructor, can we explicitly initialize AtomicBoolean(false). ## Nit: Rename scheduler to executor or scheduledExecutor? ## serviceStart: the exception thrown should have a message more amenable to the user. How about appending It appears there is another CleanerService running in the cluster? ## serviceStop: the log message that the background thread stopped should be moved to the else-block in the try, instead of after catch? {code} try { if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) { LOG.warn(Gave up waiting for the cleaner task to shutdown.); } } catch (InterruptedException e) { LOG.warn(The cleaner service was interrupted while shutting down the task., e); } LOG.info(The background thread stopped.); {code} ## runCleanerTask: Instead of checking if there is a scheduled-cleaner-task running here, why not just rely on the check in CleanerTask#run(). Agree, we might be doing a little more work here unnecessarily, but not sure the savings are worth an extra check and an extra parameter in the CleanerTask constructor. ## How does a user use runCleanerTask? Instantiate another SCM? The SCM isn't listening to any requests. I can see the SCM being run in the RM, and one could potentially add yarn rmadmin -clean-shared-cache. In any case, given there is no way to reach a running SCM, I would remove runCleanerTask altogether for now, and add it back later when we need it? Thoughts? ## Should we worry about users starting SCMs with roots at different levels that can lead to multiple cleaners? #CleanerTask: ## Nit: RENAMED_SUFFIX can be private ## Add @Override to run() ## The following code is duplicated in InMemoryStateStore a well. May be, we should just add a static method to SharedCacheUtil? {code} StringBuilder pattern = new StringBuilder(); for (int i = 0; i nestedLevel; i++) { pattern.append(*/); } pattern.append(*); {code} ## process: the sleep between directories is in millis, do we want to really calculate nanos? ## Should cleanResourceReferences be moved to SCMStore? ## For the race condition (YARN-2663), would it help to handle the delete files on HDFS in the store#remove? # CleanerMetrics: ## Make initSingleton private and call it in getInstance if the instance is null? ## How about using MutableRate or MutableStat for the rates? # Do we need CleanerMetricsCollector, wouldn't CleanerMetrics extending MetricsSource suffice? Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167175#comment-14167175 ] Karthik Kambatla edited comment on YARN-2183 at 10/10/14 5:42 PM: -- Thanks for updating the patch, Chris. Review comments on the current patch: # YarnConfiguration - let us include units for time in the config names; period-mins, initial-delay-mins, resource-sleep-ms. # CleanerService ## Don't need a SHUTDOWN_HOOK_PRIORITY specific to CleanerService, it is not being used anywhere either. ## Nit: In the constructor, can we explicitly initialize AtomicBoolean(false). ## Nit: Rename scheduler to executor or scheduledExecutor? ## serviceStart: the exception thrown should have a message more amenable to the user. How about appending It appears there is another CleanerService running in the cluster? ## serviceStop: the log message that the background thread stopped should be moved to the else-block in the try, instead of after catch? {code} try { if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) { LOG.warn(Gave up waiting for the cleaner task to shutdown.); } } catch (InterruptedException e) { LOG.warn(The cleaner service was interrupted while shutting down the task., e); } LOG.info(The background thread stopped.); {code} ## runCleanerTask: Instead of checking if there is a scheduled-cleaner-task running here, why not just rely on the check in CleanerTask#run(). Agree, we might be doing a little more work here unnecessarily, but not sure the savings are worth an extra check and an extra parameter in the CleanerTask constructor. ## How does a user use runCleanerTask? Instantiate another SCM? The SCM isn't listening to any requests. I can see the SCM being run in the RM, and one could potentially add yarn rmadmin -clean-shared-cache. In any case, given there is no way to reach a running SCM, I would remove runCleanerTask altogether for now, and add it back later when we need it? Thoughts? ## Should we worry about users starting SCMs with roots at different levels that can lead to multiple cleaners? # CleanerTask: ## Nit: RENAMED_SUFFIX can be private ## Add @Override to run() ## The following code is duplicated in InMemoryStateStore a well. May be, we should just add a static method to SharedCacheUtil? {code} StringBuilder pattern = new StringBuilder(); for (int i = 0; i nestedLevel; i++) { pattern.append(*/); } pattern.append(*); {code} ## process: the sleep between directories is in millis, do we want to really calculate nanos? ## Should cleanResourceReferences be moved to SCMStore? ## For the race condition (YARN-2663), would it help to handle the delete files on HDFS in the store#remove? # CleanerMetrics: ## Make initSingleton private and call it in getInstance if the instance is null? ## How about using MutableRate or MutableStat for the rates? # Do we need CleanerMetricsCollector, wouldn't CleanerMetrics extending MetricsSource suffice? was (Author: kkambatl): Thanks for updating the patch, Chris. Review comments on the current patch: # YarnConfiguration - let us include units for time in the config names; period-mins, initial-delay-mins, resource-sleep-ms. # CleanerService ## Don't need a SHUTDOWN_HOOK_PRIORITY specific to CleanerService, it is not being used anywhere either. ## Nit: In the constructor, can we explicitly initialize AtomicBoolean(false). ## Nit: Rename scheduler to executor or scheduledExecutor? ## serviceStart: the exception thrown should have a message more amenable to the user. How about appending It appears there is another CleanerService running in the cluster? ## serviceStop: the log message that the background thread stopped should be moved to the else-block in the try, instead of after catch? {code} try { if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) { LOG.warn(Gave up waiting for the cleaner task to shutdown.); } } catch (InterruptedException e) { LOG.warn(The cleaner service was interrupted while shutting down the task., e); } LOG.info(The background thread stopped.); {code} ## runCleanerTask: Instead of checking if there is a scheduled-cleaner-task running here, why not just rely on the check in CleanerTask#run(). Agree, we might be doing a little more work here unnecessarily, but not sure the savings are worth an extra check and an extra parameter in the CleanerTask constructor. ## How does a user use runCleanerTask? Instantiate another SCM? The SCM isn't listening to any requests. I can see the SCM being run in the RM, and one could potentially add yarn rmadmin -clean-shared-cache. In any case, given there is no way to reach a running SCM, I would remove runCleanerTask altogether for now, and add it back later when we need it? Thoughts? ##
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167222#comment-14167222 ] zhihai xu commented on YARN-90: --- In function verifyDirUsingMkdir, target.exists(), target.mkdir() and FileUtils.deleteQuietly(target) is not atomic, What happen if another thread try to create the same directory(target)? NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Assignee: Varun Vasudev Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, apache-yarn-90.8.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2661) Container Localization is not resource limited
[ https://issues.apache.org/jira/browse/YARN-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2661: --- Assignee: Anubhav Dhoot Container Localization is not resource limited -- Key: YARN-2661 URL: https://issues.apache.org/jira/browse/YARN-2661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Container localization itself can take up a lot of resources. Today this is not resource limited in any way and can adversely affect actual containers running on the node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167243#comment-14167243 ] Jian He commented on YARN-2312: --- the latest patch looks good to me, the warnings should not be related. [~jlowe], do you have more comments ? Marking ContainerId#getId as deprecated --- Key: YARN-2312 URL: https://issues.apache.org/jira/browse/YARN-2312 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch, YARN-2312.7.patch {{ContainerId#getId}} will only return partial value of containerId, only sequence number of container id without epoch, after YARN-2229. We should mark {{ContainerId#getId}} as deprecated and use {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2651) Spin off the LogRollingInterval from LogAggregationContext
[ https://issues.apache.org/jira/browse/YARN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167267#comment-14167267 ] Hadoop QA commented on YARN-2651: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674203/YARN-2651.1.1.patch against trunk revision cb81bac. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5364//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5364//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5364//console This message is automatically generated. Spin off the LogRollingInterval from LogAggregationContext -- Key: YARN-2651 URL: https://issues.apache.org/jira/browse/YARN-2651 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2651.1.1.patch, YARN-2651.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167283#comment-14167283 ] Vinod Kumar Vavilapalli commented on YARN-2494: --- Looks good now, +1. Checking this in. [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2494.20141009-1.patch, YARN-2494.20141009-2.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167296#comment-14167296 ] Hudson commented on YARN-2494: -- FAILURE: Integrated in Hadoop-trunk-Commit #6237 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6237/]) YARN-2494. Added NodeLabels Manager internal API and implementation. Contributed by Wangda Tan. (vinodkv: rev db7f1653198b950e89567c06898d64f6b930a0ee) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/DummyRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/NodeLabelsStoreEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/UpdateNodeToLabelsMappingsEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/StoreNewClusterNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/RemoveClusterNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/NodeLabelsStoreEventType.java [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2494.20141009-1.patch, YARN-2494.20141009-2.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2651) Spin off the LogRollingInterval from LogAggregationContext
[ https://issues.apache.org/jira/browse/YARN-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167314#comment-14167314 ] Xuan Gong commented on YARN-2651: - -1 release audit is not related Spin off the LogRollingInterval from LogAggregationContext -- Key: YARN-2651 URL: https://issues.apache.org/jira/browse/YARN-2651 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2651.1.1.patch, YARN-2651.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2668) yarn-registry JAR won't link against ZK 3.4.5
[ https://issues.apache.org/jira/browse/YARN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2668: - Attachment: YARN-2668-002.patch Patch -002 also fixes test case by dropping an assertion check of a missing method yarn-registry JAR won't link against ZK 3.4.5 - Key: YARN-2668 URL: https://issues.apache.org/jira/browse/YARN-2668 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2668-001.patch, YARN-2668-002.patch Original Estimate: 0.5h Remaining Estimate: 0.5h It's been reported that the registry code doesn't link against ZK 3.4.5 as the enable/disable SASL client property isn't there, which went in with ZOOKEEPER-1657. pulling in the constant and {{isEnabled()}} check will ensure registry linkage, even though the ability for a client to disable SASL auth will be lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2668) yarn-registry JAR won't link against ZK 3.4.5
[ https://issues.apache.org/jira/browse/YARN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167333#comment-14167333 ] Chris Nauroth commented on YARN-2668: - Thanks for catching that, Steve. +1 for patch v2 pending jenkins. yarn-registry JAR won't link against ZK 3.4.5 - Key: YARN-2668 URL: https://issues.apache.org/jira/browse/YARN-2668 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2668-001.patch, YARN-2668-002.patch Original Estimate: 0.5h Remaining Estimate: 0.5h It's been reported that the registry code doesn't link against ZK 3.4.5 as the enable/disable SASL client property isn't there, which went in with ZOOKEEPER-1657. pulling in the constant and {{isEnabled()}} check will ensure registry linkage, even though the ability for a client to disable SASL auth will be lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2676) Timeline authentication filter should add support for proxy user
Zhijie Shen created YARN-2676: - Summary: Timeline authentication filter should add support for proxy user Key: YARN-2676 URL: https://issues.apache.org/jira/browse/YARN-2676 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid
Steve Loughran created YARN-2677: Summary: registry punycoding of usernames doesn't fix all usernames to be DNS-valid Key: YARN-2677 URL: https://issues.apache.org/jira/browse/YARN-2677 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran The registry has a restriction DNS-valid names only to retain the future option of DNS exporting of the registry. to handle complex usernames, it punycodes the username first, using Java's {{java.net.IDN}} class. This turns out to only map high unicode- ASCII, and does nothing for ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names (e.g. with an underscore in them) from being able to register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid
[ https://issues.apache.org/jira/browse/YARN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167385#comment-14167385 ] Steve Loughran commented on YARN-2677: -- SLIDER-500 picked this up: {code} ServiceRecord{description='Slider Application Master'; external endpoints: {Endpoint{api='org.apache.slider.appmaster', addressType='host/port', protocolType='hadoop/protobuf', addresses=[ [ ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal 50767 ] ] }; Endpoint{api='org.apache.http.UI', addressType='uri', protocolType='webui', addresses=[ [ http://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:40805; ] ] }; Endpoint{api='org.apache.slider.management', addressType='uri', protocolType='REST', addresses=[ [ http://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:40805/ws/v1/slider/mgmt; ] ] }; Endpoint{api='org.apache.slider.publisher', addressType='uri', protocolType='REST', addresses=[ [ http://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:40805/ws/v1/slider/publisher; ] ] }; Endpoint{api='org.apache.slider.registry', addressType='uri', protocolType='REST', addresses=[ [ http://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:40805/ws/v1/slider/registry; ] ] }; Endpoint{api='org.apache.slider.publisher.configurations', addressType='uri', protocolType='REST', addresses=[ [ http://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:40805/ws/v1/slider/publisher/slider; ] ] }; }; internal endpoints: {Endpoint{api='org.apache.slider.agents.secure', addressType='uri', protocolType='REST', addresses=[ [ https://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:55485/ws/v1/slider/agents; ] ] }; Endpoint{api='org.apache.slider.agents.oneway', addressType='uri', protocolType='REST', addresses=[ [ https://ts-oracle6-uns-accumulo-slider-1010-4.cs1cloud.internal:55688/ws/v1/slider/agents; ] ] }; }, attributes: {yarn:persistence=application yarn:id=application_1412950151590_0009 }}Exception: `/users/hrt_qa/services/org-apache-slider/test-slider-accumulo-continuous-ingest': Invalid Path element hrt_qa14/10/10 15:44:53 ERROR main.ServiceLauncher: Exception: `/users/hrt_qa/services/org-apache-slider/test-slider-accumulo-continuous-ingest': Invalid Path element hrt_qaorg.apache.hadoop.registry.client.exceptions.InvalidPathnameException: `/users/hrt_qa/services/org-apache-slider/test-slider-accumulo-continuous-ingest': Invalid Path element hrt_qa at org.apache.hadoop.registry.client.binding.RegistryPathUtils.validateElementsAsDNS(RegistryPathUtils.java:78) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.validatePath(RegistryOperationsService.java:91) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.delete(RegistryOperationsService.java:151) at org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:138) at org.apache.slider.server.appmaster.SliderAppMaster.registerServiceInstance(SliderAppMaster.java:1002) at org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:846) at org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:507) at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:186) at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401) at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626) at org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:2124)14/10/10 15:44:53 INFO util.ExitUtil: Exiting with status 3214/10/10 15:44:53 INFO mortbay.log: Shutdown hook executing14/10/10 15:44:53 INFO mortbay.log: Stopped SslSelectChannelConnector@0.0.0.0:5548514/10/10 15:44:53 INFO mortbay.log: Stopped SslSelectChannelConnector@0.0.0.0:55688 {code} registry punycoding of usernames doesn't fix all usernames to be DNS-valid -- Key: YARN-2677 URL: https://issues.apache.org/jira/browse/YARN-2677 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran The registry has a restriction DNS-valid names only to retain the future option of DNS exporting of the registry. to handle complex usernames, it punycodes the username first, using Java's {{java.net.IDN}} class. This turns out to only map high unicode- ASCII, and does nothing for ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names (e.g.
[jira] [Commented] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid
[ https://issues.apache.org/jira/browse/YARN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167392#comment-14167392 ] Steve Loughran commented on YARN-2677: -- Proposed fix is # retain the restriction in the specification # retain the punycoding in the code # do NOT validate the hostname against DNS rules. If and when a switch to DNS exports takes place, the names will have to be escaped into DNS. The punycoding encoding guarantees that for a REST API, URLs paths will always be ASCII registry punycoding of usernames doesn't fix all usernames to be DNS-valid -- Key: YARN-2677 URL: https://issues.apache.org/jira/browse/YARN-2677 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran The registry has a restriction DNS-valid names only to retain the future option of DNS exporting of the registry. to handle complex usernames, it punycodes the username first, using Java's {{java.net.IDN}} class. This turns out to only map high unicode- ASCII, and does nothing for ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names (e.g. with an underscore in them) from being able to register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2668) yarn-registry JAR won't link against ZK 3.4.5
[ https://issues.apache.org/jira/browse/YARN-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167399#comment-14167399 ] Hadoop QA commented on YARN-2668: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674225/YARN-2668-002.patch against trunk revision db7f165. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5365//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5365//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5365//console This message is automatically generated. yarn-registry JAR won't link against ZK 3.4.5 - Key: YARN-2668 URL: https://issues.apache.org/jira/browse/YARN-2668 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2668-001.patch, YARN-2668-002.patch Original Estimate: 0.5h Remaining Estimate: 0.5h It's been reported that the registry code doesn't link against ZK 3.4.5 as the enable/disable SASL client property isn't there, which went in with ZOOKEEPER-1657. pulling in the constant and {{isEnabled()}} check will ensure registry linkage, even though the ability for a client to disable SASL auth will be lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned YARN-2423: --- Assignee: Robert Kanter TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2308: -- Attachment: YARN-2308.0.patch Here's a patch which does what I believe the consensus was - detects the condition and throws a descriptive exception which will not be caught / will result in rm exit NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: YARN-2308.0.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid
[ https://issues.apache.org/jira/browse/YARN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167486#comment-14167486 ] Steve Loughran commented on YARN-2677: -- it also looks like somehow the full kerberos name creeps in —that needs to be name-shortened the way UGI does {code} 14/10/10 17:35:45 ERROR main.ServiceLauncher: Exception: `/users/hb...@hwqe.hortonworks.com/services/org-apache-slider/hbcluster24': Invalid Path element hb...@hwqe.hortonworks.com org.apache.hadoop.registry.client.exceptions.InvalidPathnameException: `/users/hb...@apache.org/services/org-apache-slider/hbcluster24': Invalid Path element hbase@@APACHE.ORG at org.apache.hadoop.registry.client.binding.RegistryPathUtils.validateElementsAsDNS(RegistryPathUtils.java:78) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.validatePath(RegistryOperationsService.java:91) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.delete(RegistryOperationsService.java:151) at org.apache.slider.server.services.yarnregistry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:138) at org.apache.slider.server.appmaster.SliderAppMaster.registerServiceInstance(SliderAppMaster.java:1002) at org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:846) at org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:507) {code} registry punycoding of usernames doesn't fix all usernames to be DNS-valid -- Key: YARN-2677 URL: https://issues.apache.org/jira/browse/YARN-2677 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran The registry has a restriction DNS-valid names only to retain the future option of DNS exporting of the registry. to handle complex usernames, it punycodes the username first, using Java's {{java.net.IDN}} class. This turns out to only map high unicode- ASCII, and does nothing for ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names (e.g. with an underscore in them) from being able to register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167496#comment-14167496 ] Sangjin Lee commented on YARN-2183: --- Thanks much for great feedback Karthik! I'm going over them, and I'll reply with a new patch and comments. Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167508#comment-14167508 ] Hadoop QA commented on YARN-2571: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673705/YARN-2571-002.patch against trunk revision db7f165. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5366//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5366//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5366//console This message is automatically generated. RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid
[ https://issues.apache.org/jira/browse/YARN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2677: - Attachment: YARN-2677-001.patch patch that # stops checking for validity of entries against DNS pattern # strips out kerberos realm/host info, so that {{hbase/host@REALM}} maps down to {{hbase}}. UGI should have been doing this in its shortname logic, but clearly not registry punycoding of usernames doesn't fix all usernames to be DNS-valid -- Key: YARN-2677 URL: https://issues.apache.org/jira/browse/YARN-2677 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2677-001.patch The registry has a restriction DNS-valid names only to retain the future option of DNS exporting of the registry. to handle complex usernames, it punycodes the username first, using Java's {{java.net.IDN}} class. This turns out to only map high unicode- ASCII, and does nothing for ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names (e.g. with an underscore in them) from being able to register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167615#comment-14167615 ] Hadoop QA commented on YARN-2308: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674250/YARN-2308.0.patch against trunk revision d3d3d47. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5367//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5367//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5367//console This message is automatically generated. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: YARN-2308.0.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167622#comment-14167622 ] Craig Welch commented on YARN-2308: --- The release audit is not related to this change, it appears to be related to the yarn registry change. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: YARN-2308.0.patch, jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2677) registry punycoding of usernames doesn't fix all usernames to be DNS-valid
[ https://issues.apache.org/jira/browse/YARN-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167690#comment-14167690 ] Hadoop QA commented on YARN-2677: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674272/YARN-2677-001.patch against trunk revision d3d3d47. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5368//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5368//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5368//console This message is automatically generated. registry punycoding of usernames doesn't fix all usernames to be DNS-valid -- Key: YARN-2677 URL: https://issues.apache.org/jira/browse/YARN-2677 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2677-001.patch The registry has a restriction DNS-valid names only to retain the future option of DNS exporting of the registry. to handle complex usernames, it punycodes the username first, using Java's {{java.net.IDN}} class. This turns out to only map high unicode- ASCII, and does nothing for ascii-but-invalid-hostname chars, so stopping users with DNS-illegal names (e.g. with an underscore in them) from being able to register -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167692#comment-14167692 ] Wei Yan commented on YARN-2641: --- bq. I think the actual decommission happen when NM receive shutdown from RM heartbeat back. Isn't it? So the latency between decommission CLI and node get decommissioned won't affected. Also, in most cases, resource scheduling is triggered by NM's heartbeat with RM. So the latency of decommission CLI and scheduling container on nodes won't get affected (except attempt scheduling). So IMO, this patch only improve the latency for attempt scheduling case. Do we have some other scenarios to address? From my understanding, currently if one NM failed or killed, the RM cannot gets that information until yarn.nm.liveness-monitor.expiry-interval-ms expired. That means, all containers running on that failed NM are assumed to be still running from the RM and AM sides, until the timeout. However, [~zxu]'s point is that, the RM doesn't need to wait a long time to get NM killed information, the RM can get this information directly when refreshNodes command is triggered. For example, if the user removes one NM, and then does refreshNodes, the RM can understand that NM killed quickly and can notify all applications about that, without needing to wait for the heartbeat timeout. And the AMs can act on that quickly. improve node decommission latency in RM. Key: YARN-2641 URL: https://issues.apache.org/jira/browse/YARN-2641 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2641.000.patch, YARN-2641.001.patch improve node decommission latency in RM. Currently the node decommission only happened after RM received nodeHeartbeat from the Node Manager. The node heartbeat interval is configurable. The default value is 1 second. It will be better to do the decommission during RM Refresh(NodesListManager) instead of nodeHeartbeat(ResourceTrackerService). This will be a much more serious issue: After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed before NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. The RMNode will only expire in RM after yarn.nm.liveness-monitor.expiry-interval-ms(default value 10 minutes) time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2566: --- Priority: Critical (was: Major) Target Version/s: 2.6.0 IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir. - Key: YARN-2566 URL: https://issues.apache.org/jira/browse/YARN-2566 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-2566.000.patch, YARN-2566.001.patch, YARN-2566.002.patch, YARN-2566.003.patch, YARN-2566.004.patch startLocalizer in DefaultContainerExecutor will only use the first localDir to copy the token file, if the copy is failed for first localDir due to not enough disk space in the first localDir, the localization will be failed even there are plenty of disk space in other localDirs. We see the following error for this case: {code} 2014-09-13 23:33:25,171 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to create app directory /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 java.io.IOException: mkdir of /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) 2014-09-13 23:33:25,185 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed java.io.FileNotFoundException: File file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344) at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) 2014-09-13 23:33:25,186 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1410663092546_0004_01_01 transitioned from LOCALIZING to LOCALIZATION_FAILED 2014-09-13 23:33:25,187 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED APPID=application_1410663092546_0004
[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167726#comment-14167726 ] Karthik Kambatla commented on YARN-2566: Really like the way the patch picks a directory - the chance of picking a directory is proportional to the available space on the directory. Review comments on the patch: # We should add comments to explain how we pick the directory. # If {{totalAvailable = 0}}, we should throw an IOException and fail the task early, instead of picking some directory and failing later. The test might need to updated accordingly. # Then, the following while doesn't need to check for available-space-on-disk being zero. {code} while ((availableOnDisk[dir] == 0L) || (randomPosition availableOnDisk[dir])) { randomPosition -= availableOnDisk[dir++]; } {code} # The test should use Assert.fail instead of Assert.assertTrue(msg, false) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir. - Key: YARN-2566 URL: https://issues.apache.org/jira/browse/YARN-2566 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-2566.000.patch, YARN-2566.001.patch, YARN-2566.002.patch, YARN-2566.003.patch, YARN-2566.004.patch startLocalizer in DefaultContainerExecutor will only use the first localDir to copy the token file, if the copy is failed for first localDir due to not enough disk space in the first localDir, the localization will be failed even there are plenty of disk space in other localDirs. We see the following error for this case: {code} 2014-09-13 23:33:25,171 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to create app directory /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 java.io.IOException: mkdir of /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) 2014-09-13 23:33:25,185 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed java.io.FileNotFoundException: File file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344) at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) at
[jira] [Updated] (YARN-2493) [YARN-2493] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2493: -- Summary: [YARN-2493] API changes for users (was: [YARN-796] API changes for users) [YARN-2493] API changes for users - Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2496) [YARN-2496] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2496: -- Summary: [YARN-2496] Changes for capacity scheduler to support allocate resource respect labels (was: [YARN-796] Changes for capacity scheduler to support allocate resource respect labels) [YARN-2496] Changes for capacity scheduler to support allocate resource respect labels -- Key: YARN-2496 URL: https://issues.apache.org/jira/browse/YARN-2496 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2496-20141009-1.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch This JIRA Includes: - Add/parse labels option to {{capacity-scheduler.xml}} similar to other options of queue like capacity/maximum-capacity, etc. - Include a default-label-expression option in queue config, if an app doesn't specify label-expression, default-label-expression of queue will be used. - Check if labels can be accessed by the queue when submit an app with labels-expression to queue or update ResourceRequest with label-expression - Check labels on NM when trying to allocate ResourceRequest on the NM with label-expression - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2494) [YARN-2494] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2494: -- Summary: [YARN-2494] Node label manager API and storage implementations (was: [YARN-796] Node label manager API and storage implementations) [YARN-2494] Node label manager API and storage implementations -- Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2494.20141009-1.patch, YARN-2494.20141009-2.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) [YARN-2495] Allow admin specify labels in each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2495: -- Summary: [YARN-2495] Allow admin specify labels in each NM (Distributed configuration) (was: [YARN-796] Allow admin specify labels in each NM (Distributed configuration)) [YARN-2495] Allow admin specify labels in each NM (Distributed configuration) - Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml or using script suggested by [~aw]) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2498) [YARN-2498] Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2498: -- Summary: [YARN-2498] Respect labels in preemption policy of capacity scheduler (was: [YARN-796] Respect labels in preemption policy of capacity scheduler) [YARN-2498] Respect labels in preemption policy of capacity scheduler - Key: YARN-2498 URL: https://issues.apache.org/jira/browse/YARN-2498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, yarn-2498-implementation-notes.pdf There're 3 stages in ProportionalCapacityPreemptionPolicy, # Recursively calculate {{ideal_assigned}} for queue. This is depends on available resource, resource used/pending in each queue and guaranteed capacity of each queue. # Mark to-be preempted containers: For each over-satisfied queue, it will mark some containers will be preempted. # Notify scheduler about to-be preempted container. We need respect labels in the cluster for both #1 and #2: For #1, when there're some resource available in the cluster, we shouldn't assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot access such labels For #2, when we make decision about whether we need preempt a container, we need make sure, resource this container is *possibly* usable by a queue which is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2497) [YARN-2497] Changes for fair scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2497: -- Summary: [YARN-2497] Changes for fair scheduler to support allocate resource respect labels (was: [YARN-796] Changes for fair scheduler to support allocate resource respect labels) [YARN-2497] Changes for fair scheduler to support allocate resource respect labels -- Key: YARN-2497 URL: https://issues.apache.org/jira/browse/YARN-2497 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Yuliya Feldman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2499) [YARN-2499] Respect labels in preemption policy of fair scheduler
[ https://issues.apache.org/jira/browse/YARN-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2499: -- Summary: [YARN-2499] Respect labels in preemption policy of fair scheduler (was: [YARN-796] Respect labels in preemption policy of fair scheduler) [YARN-2499] Respect labels in preemption policy of fair scheduler - Key: YARN-2499 URL: https://issues.apache.org/jira/browse/YARN-2499 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2505) [YARN-2505] Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2505: -- Summary: [YARN-2505] Support get/add/remove/change labels in RM REST API (was: [YARN-796] Support get/add/remove/change labels in RM REST API) [YARN-2505] Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2493) [YARN-2493] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167746#comment-14167746 ] Wangda Tan commented on YARN-2493: -- [~ksumit], originally the YARN-796 in the title is to emphasis it's a sub task of YARN-796. It's no need to place a {{[YARN-2493]}} at the beginning of the title of YARN-2493 JIRA itself. Could you please change them back? [YARN-2493] API changes for users - Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2493) [YARN-2493] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167759#comment-14167759 ] Sumit Kumar commented on YARN-2493: --- I don't think that's a good idea (calling all these JIRAs as YARN-796): 1. It's not clear what individual JIRAs they are. Most of the people in the community refer to JIRA numbers directly. 2. These JIRA ids are indexed in google and other search engines and help everyone else when they are looking for specific work. 3. JIRA already provides a way of tracking parent task. That means we can make these tasks child of the parent YARN-796 Sorry if this was discussed before and consensus was still to name all the JIRAs with prefix YARN-796 If so, i will revert my changes. [YARN-2493] API changes for users - Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2493) [YARN-2493] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167761#comment-14167761 ] Sumit Kumar commented on YARN-2493: --- Adding YARN-796 linking [YARN-2493] API changes for users - Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2493) [YARN-2493] API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167779#comment-14167779 ] Wangda Tan commented on YARN-2493: -- [~ksumit], I think its better to have is part of instead of adding the YARN-796 directly, thanks for your effort! But could you please remove JIRA ID like YARN-2493 from these JIRAs? I think it's a kind of duplicated. Thanks, Wangda [YARN-2493] API changes for users - Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2656) RM web services authentication filter should add support for proxy user
[ https://issues.apache.org/jira/browse/YARN-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2656: -- Attachment: YARN-2656.4.patch Update the config name RM web services authentication filter should add support for proxy user --- Key: YARN-2656 URL: https://issues.apache.org/jira/browse/YARN-2656 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2656.3.patch, YARN-2656.4.patch, apache-yarn-2656.0.patch, apache-yarn-2656.1.patch, apache-yarn-2656.2.patch The DelegationTokenAuthenticationFilter adds support for doAs functionality. The RMAuthenticationFilter should expose this as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2493) API changes for users
[ https://issues.apache.org/jira/browse/YARN-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2493: -- Summary: API changes for users (was: [YARN-2493] API changes for users) API changes for users - Key: YARN-2493 URL: https://issues.apache.org/jira/browse/YARN-2493 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2493-20141008.1.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch, YARN-2493.patch This JIRA includes API changes for users of YARN-796, like changes in {{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common part of YARN-796. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2498) Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2498: -- Summary: Respect labels in preemption policy of capacity scheduler (was: [YARN-2498] Respect labels in preemption policy of capacity scheduler) Respect labels in preemption policy of capacity scheduler - Key: YARN-2498 URL: https://issues.apache.org/jira/browse/YARN-2498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, yarn-2498-implementation-notes.pdf There're 3 stages in ProportionalCapacityPreemptionPolicy, # Recursively calculate {{ideal_assigned}} for queue. This is depends on available resource, resource used/pending in each queue and guaranteed capacity of each queue. # Mark to-be preempted containers: For each over-satisfied queue, it will mark some containers will be preempted. # Notify scheduler about to-be preempted container. We need respect labels in the cluster for both #1 and #2: For #1, when there're some resource available in the cluster, we shouldn't assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot access such labels For #2, when we make decision about whether we need preempt a container, we need make sure, resource this container is *possibly* usable by a queue which is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2499) Respect labels in preemption policy of fair scheduler
[ https://issues.apache.org/jira/browse/YARN-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2499: -- Summary: Respect labels in preemption policy of fair scheduler (was: [YARN-2499] Respect labels in preemption policy of fair scheduler) Respect labels in preemption policy of fair scheduler - Key: YARN-2499 URL: https://issues.apache.org/jira/browse/YARN-2499 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2497) Changes for fair scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated YARN-2497: -- Summary: Changes for fair scheduler to support allocate resource respect labels (was: [YARN-2497] Changes for fair scheduler to support allocate resource respect labels) Changes for fair scheduler to support allocate resource respect labels -- Key: YARN-2497 URL: https://issues.apache.org/jira/browse/YARN-2497 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Yuliya Feldman -- This message was sent by Atlassian JIRA (v6.3.4#6332)