[jira] [Created] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval
Anubhav Dhoot created YARN-3259: --- Summary: FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval Key: YARN-3259 URL: https://issues.apache.org/jira/browse/YARN-3259 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Instead of waiting for update interval unconditionally, we can trigger early updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3248: Attachment: apache-yarn-3248.0.patch Screenshot.jpg Uploaded patch with fix and screenshot showing the new apps UI Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screenshot.jpg, apache-yarn-3248.0.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336534#comment-14336534 ] Hudson commented on YARN-3247: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #106 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/106/]) YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336526#comment-14336526 ] Hudson commented on YARN-2980: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #106 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/106/]) YARN-2980. Move health check script related functionality to hadoop-common (Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Fix For: 3.0.0 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336452#comment-14336452 ] Hadoop QA commented on YARN-3249: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700724/YARN-3249.3.patch against trunk revision ad8ed3e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6731//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6731//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6731//console This message is automatically generated. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, YARN-3249.patch, killapp-failed.log, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336451#comment-14336451 ] Tsuyoshi Ozawa commented on YARN-1809: -- [~xgong] The failure of TestRMWebAppFairScheduler looks related - could you double check? Minor nits: ApplicationHistoryClientService.java: is cancelDelegationToken's comment correct? {code} + @Override + public CancelDelegationTokenResponse cancelDelegationToken( + CancelDelegationTokenRequest request) throws YarnException, IOException { +// TODO Auto-generated method stub +return null; + } {code} TestAHSWebServices.java: Noothing should be Nothing. {code} + protected void serviceStart() throws Exception { +// Do Noothing + } {code} Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.11.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3258) FairScheduler: Need to add more logging to investigate allocations
[ https://issues.apache.org/jira/browse/YARN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336502#comment-14336502 ] Hadoop QA commented on YARN-3258: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700747/YARN-3258.001.patch against trunk revision ad8ed3e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6732//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6732//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6732//console This message is automatically generated. FairScheduler: Need to add more logging to investigate allocations -- Key: YARN-3258 URL: https://issues.apache.org/jira/browse/YARN-3258 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Minor Attachments: YARN-3258.001.patch Its hard to investigate allocation failures without any logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3258) FairScheduler: Need to add more logging to investigate allocations
Anubhav Dhoot created YARN-3258: --- Summary: FairScheduler: Need to add more logging to investigate allocations Key: YARN-3258 URL: https://issues.apache.org/jira/browse/YARN-3258 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Minor Its hard to investigate allocation failures without any logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336413#comment-14336413 ] Hudson commented on YARN-3247: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #849 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/849/]) YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3258) FairScheduler: Need to add more logging to investigate allocations
[ https://issues.apache.org/jira/browse/YARN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3258: Attachment: YARN-3258.001.patch FairScheduler: Need to add more logging to investigate allocations -- Key: YARN-3258 URL: https://issues.apache.org/jira/browse/YARN-3258 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Minor Attachments: YARN-3258.001.patch Its hard to investigate allocation failures without any logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3249: - Attachment: killapp-failed2.log [~ryu_kobayashi] I faced another 500 error. Please check attached log. BTW, is it possible to add a test to TestRMWebServicesApps? It would be good to avoid this kind of errors. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, YARN-3249.patch, killapp-failed.log, killapp-failed2.log, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336513#comment-14336513 ] Hudson commented on YARN-3247: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2047/]) YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java * hadoop-yarn-project/CHANGES.txt TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336505#comment-14336505 ] Hudson commented on YARN-2980: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2047/]) YARN-2980. Move health check script related functionality to hadoop-common (Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Fix For: 3.0.0 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336524#comment-14336524 ] Hadoop QA commented on YARN-3249: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700767/killapp-failed2.log against trunk revision ad8ed3e. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6735//console This message is automatically generated. Add the kill application to the Resource Manager Web UI --- Key: YARN-3249 URL: https://issues.apache.org/jira/browse/YARN-3249 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, YARN-3249.patch, killapp-failed.log, killapp-failed2.log, screenshot.png It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336459#comment-14336459 ] Hadoop QA commented on YARN-3259: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700748/YARN-3259.001.patch against trunk revision ad8ed3e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6733//console This message is automatically generated. FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval -- Key: YARN-3259 URL: https://issues.apache.org/jira/browse/YARN-3259 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3259.001.patch Instead of waiting for update interval unconditionally, we can trigger early updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2776) In HA mode, can't set ip but hostname to yarn.resourcemanager.webapp.address.*
[ https://issues.apache.org/jira/browse/YARN-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336540#comment-14336540 ] Mukesh Jha commented on YARN-2776: -- I'm facing this exact issue as described in https://issues.apache.org/jira/browse/SPARK-5837 (comments) From the Driver logs [3] I can see that SparkUI started on a specified port, also my YARN app tracking URL[1] points to that port which is in turn getting redirected to the proxy URL[2] which gives me java.net.BindException: Cannot assign requested address. If there was a port conflict issue the sparkUI stark will have issues but that id not the case. [1] YARN: application_1424814313649_0006 spark-realtime-MessageStoreWriter SPARK ciuser root.ciuser RUNNING UNDEFINED 10% http://host21.cloud.com:44648 [2] ProxyURL: http://host28.cloud.com:8088/proxy/application_1424814313649_0006/ [3] LOGS: 15/02/25 04:25:02 INFO util.Utils: Successfully started service 'SparkUI' on port 44648. 15/02/25 04:25:02 INFO ui.SparkUI: Started SparkUI at http://host21.cloud.com:44648 15/02/25 04:25:02 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/02/25 04:25:02 INFO netty.NettyBlockTransferService: Server created on 41518 I am running a spark-streaming app inside YARN. I have Spark History server running as well (Do we need it running to access UI?). The app is running fine as expected but the Spark's web UI is not accessible. When I try to access the ApplicationMaster of the Yarn application I get the below error. java.net.BindException: Cannot assign requested address at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.Socket.bind(Socket.java:631) at java.net.Socket.init(Socket.java:423) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:188) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:345) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:555) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336405#comment-14336405 ] Hudson commented on YARN-2980: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #849 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/849/]) YARN-2980. Move health check script related functionality to hadoop-common (Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Fix For: 3.0.0 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3259) FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval
[ https://issues.apache.org/jira/browse/YARN-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3259: Attachment: YARN-3259.001.patch FairScheduler: Update to fairShare could be triggered early on node events instead of waiting for update interval -- Key: YARN-3259 URL: https://issues.apache.org/jira/browse/YARN-3259 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3259.001.patch Instead of waiting for update interval unconditionally, we can trigger early updates on important events - for eg node join and leave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336595#comment-14336595 ] Hudson commented on YARN-3247: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #115 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/115/]) YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java * hadoop-yarn-project/CHANGES.txt TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336641#comment-14336641 ] Varun Vasudev commented on YARN-3248: - Test case failure and findbugs warnings are not related to this patch. There are no tests since it's a UI change. Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screenshot.jpg, apache-yarn-3248.0.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3260) NPE if AM attempts to register before RM processes launch event
Jason Lowe created YARN-3260: Summary: NPE if AM attempts to register before RM processes launch event Key: YARN-3260 URL: https://issues.apache.org/jira/browse/YARN-3260 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe The RM on one of our clusters was running behind on processing AsyncDispatcher events, and this caused AMs to fail to register due to an NPE. The AM was launched and attempting to register before the RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token had not been generated yet. The NPE occurred because the ApplicationMasterService tried to encode the missing token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336634#comment-14336634 ] Hadoop QA commented on YARN-3248: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700766/apache-yarn-3248.0.patch against trunk revision ad8ed3e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6734//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6734//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6734//console This message is automatically generated. Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screenshot.jpg, apache-yarn-3248.0.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3260) NPE if AM attempts to register before RM processes launch event
[ https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3260: --- Assignee: Naganarasimha G R NPE if AM attempts to register before RM processes launch event --- Key: YARN-3260 URL: https://issues.apache.org/jira/browse/YARN-3260 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R The RM on one of our clusters was running behind on processing AsyncDispatcher events, and this caused AMs to fail to register due to an NPE. The AM was launched and attempting to register before the RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token had not been generated yet. The NPE occurred because the ApplicationMasterService tried to encode the missing token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336621#comment-14336621 ] Hudson commented on YARN-3247: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2065 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2065/]) YARN-3247. TestQueueMappings should use CapacityScheduler explicitly. Contributed by Zhihai Xu. (ozawa: rev 6cbd9f1113fca9ff86fd6ffa783ecd54b147e0db) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueMappings.java * hadoop-yarn-project/CHANGES.txt TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336587#comment-14336587 ] Hudson commented on YARN-2980: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #115 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/115/]) YARN-2980. Move health check script related functionality to hadoop-common (Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Fix For: 3.0.0 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event
[ https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336647#comment-14336647 ] Jason Lowe commented on YARN-3260: -- Sample stack trace from the AM side when it fails to register: {noformat} 2015-02-25 06:26:26,908 ERROR [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while registering java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:294) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2079) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2075) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2073) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy80.registerApplicationMaster(Unknown Source) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:161) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:241) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:819) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1087) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429) Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:294) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336678#comment-14336678 ] Jason Lowe commented on YARN-3239: -- Committing this. WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336613#comment-14336613 ] Hudson commented on YARN-2980: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2065 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2065/]) YARN-2980. Move health check script related functionality to hadoop-common (Varun Saxena via aw) (aw: rev d4ac6822e1c5dfac504ced48f10ab57a55b49e93) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesContainers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java Move health check script related functionality to hadoop-common --- Key: YARN-2980 URL: https://issues.apache.org/jira/browse/YARN-2980 Project: Hadoop YARN Issue Type: Improvement Reporter: Ming Ma Assignee: Varun Saxena Fix For: 3.0.0 Attachments: YARN-2980.001.patch, YARN-2980.002.patch, YARN-2980.003.patch, YARN-2980.004.patch HDFS might want to leverage health check functionality available in YARN in both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode https://issues.apache.org/jira/browse/HDFS-7441. We can move health check functionality including the protocol between hadoop daemons and health check script to hadoop-common. That will simplify the development and maintenance for both hadoop source code and health check script. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event
[ https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336665#comment-14336665 ] Naganarasimha G R commented on YARN-3260: - Hi [~jlowe], I would like to work on this issue but if you are already planning to fix this please feel free to reassign. NPE if AM attempts to register before RM processes launch event --- Key: YARN-3260 URL: https://issues.apache.org/jira/browse/YARN-3260 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R The RM on one of our clusters was running behind on processing AsyncDispatcher events, and this caused AMs to fail to register due to an NPE. The AM was launched and attempting to register before the RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token had not been generated yet. The NPE occurred because the ApplicationMasterService tried to encode the missing token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params
[ https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336693#comment-14336693 ] Hudson commented on YARN-3239: -- FAILURE: Integrated in Hadoop-trunk-Commit #7198 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7198/]) YARN-3239. WebAppProxy does not support a final tracking url which has query fragments and params. Contributed by Jian He (jlowe: rev 1a68fc43464d3948418f453bb2f80df7ce773097) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java WebAppProxy does not support a final tracking url which has query fragments and params --- Key: YARN-3239 URL: https://issues.apache.org/jira/browse/YARN-3239 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3239.1.patch Examples of failures: Expected: {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}} Actual: {{http://uihost:8080}} Tried with a minor change to remove the #. Saw a different issue: Expected: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}} Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}} yarn application -status appId returns the expected value correctly. However, invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336697#comment-14336697 ] Ming Ma commented on YARN-3231: --- LGTM. FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336724#comment-14336724 ] Jason Lowe commented on YARN-2902: -- We still need to do this for APPLICATION resources. It is true that those resources will be cleaned up when the application finishes, but that could be hours or days later. And as for PUBLIC resources, Sangjin confirmed earlier he's seen the orphaning occur with those resources as well, so it must be occurring somehow even for those. [~sjlee0] do you have any ideas on how PUBLIC resources ended up hung in a DOWNLOADING state? I'm wondering if this is specific to the shared cache setup or if there's a code path we're missing. I don't think we should special case the resource types to fix this. Again I think the cleanest approach is to make sure we send an event to the LocalizedResource when a container localizer (or maybe just the container itself) is killed, and let that state machine handle it appropriately (e.g.: try to remove the _tmp file if the resource was in the downloading state, ignore it if it's already localized, etc.). Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3247) TestQueueMappings should use CapacityScheduler explicitly
[ https://issues.apache.org/jira/browse/YARN-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336747#comment-14336747 ] zhihai xu commented on YARN-3247: - Thanks [~ozawa] for reviewing and committing the patch! Greatly appreciated. zhihai TestQueueMappings should use CapacityScheduler explicitly - Key: YARN-3247 URL: https://issues.apache.org/jira/browse/YARN-3247 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3247.000.patch TestQueueMappings is only supported by CapacityScheduler. We should configure CapacityScheduler for this test. Otherwise if the default scheduler is set to FairScheduler, the test will fail with the following message: {code} Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.392 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings testQueueMapping(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings) Time elapsed: 2.202 sec ERROR! java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:118) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1319) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:558) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:989) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings.testQueueMapping(TestQueueMappings.java:143) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336763#comment-14336763 ] Hadoop QA commented on YARN-2902: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685607/YARN-2902.002.patch against trunk revision 1a68fc4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6736//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6736//console This message is automatically generated. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336770#comment-14336770 ] Tsuyoshi Ozawa commented on YARN-3248: -- [~vvasudev] Thanks for taking this issue. This change looks very useful. I have some comments against 1st patch: The blacklist is an instance of HashSet, so it can throw ConcurrentModificationException when blacklist is modified in another thread. One alternative is to use Collections.newSetFromMap(new ConcurrentHashMapObject,Boolean()) instead of HashSet. {code} + public SetString getBlacklistedNodes() { +return this.appSchedulingInfo.getBlackList(); + } {code} If AbstractYarnScheduler#getApplicationAttempt() can be used, I think it's more straightforward and simple. What do you think? {code} + private CapacityScheduler scheduler = null; {code} Could you add tests to TestRMWebServicesApps? Display count of nodes blacklisted by apps in the web UI Key: YARN-3248 URL: https://issues.apache.org/jira/browse/YARN-3248 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screenshot.jpg, apache-yarn-3248.0.patch It would be really useful when debugging app performance and failure issues to get a count of the nodes blacklisted by individual apps displayed in the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336691#comment-14336691 ] Varun Saxena commented on YARN-2902: [~jlowe], looked into it. Was able to simulate the issue as well for PRIVATE resources. I think we need to handle only for PRIVATE resources. APPLICATION resources will be cleaned up when application finishes. And PUBLIC resources should not remain orphaned as we do not kill or stop PublicLocalizer in between. To download the resource, FSDownload appends a _tmp at the end of the directory to which resource will be downloaded to. And while processing HB from Container Localizer, NM sends a destination path for the resource to be downloaded in response. We also download one resource at a time. So, we can store this destination path in a queue in LocalizerRunner whenever we are sending a new path for download and remove it when fetch is successful. When container is killed (which causes LocalizerRunner to be cleaned up) we can fetch the path from the front of the queue and submit the associated temp path for deletion to DeletionService, if ref count for the resource is 0. We cannot do this cleanup in ContainerLocalizer as LCE launches it as a new process and kills it when LocalizerRunner is interrupted. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336707#comment-14336707 ] Junping Du commented on YARN-3039: -- Thanks [~Naganarasimha] and [~rkanter] for review and comments! bq. I feel AM should be informed of AggregatorAddr as early as register itself than currently being done in ApplicationMasterService.allocate(). That's a good point. Another idea (from Vinod in offline discussion) is to add a blocking call in AMRMClient to get aggregator address directly from RM. AMRMClient can be wrapped into TimelineClient so no aggregator address or aggregator failure can be handled transparently. Thoughts? bq. For NM's too, would it be better to update during registering itself (may be recovered during recovery, not sure though) thoughts? I think NM case is slightly different here: NM need this knowledge whenever the first container of this app get allocated/launched, so get things updated in heartbeat sounds good enough. Isn't it? In addition, if adding a new API in AMRMClient can be accepted, NM will use TimelineClient too so can handle service discovery automatically. bq. Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on YARN-3030 (Aggregators collection through NM's Aux service), PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so will AppLevelAggregatorService inform RM about the aggregator for the application? and then RM will inform NM about the appAggregatorAddr as part of heart beat response ? if this is the flow will there be chances of race condition where in before NM gets appAggregatorAddr from RM, NM might require to post some AM container Entities/events? I think we can discuss this flow in two scenarios, the first time launch of app aggregator and app aggregator failed over on another NM: For the first time launch of app aggregator, NM aux service will bind the app aggregator to perNodeAggregator when AM container get allocated (per YARN-3030). NM will notify RM that this new appAggregator is ready for use in next heartbeat to RM (missing in this patch). After received this messsage from NM, RM with update its aggregator list and send RMAppEventType.AGGREGATOR_UPDATE to trigger persistent of aggregator list updating in RMStateStore (for RM failed over). For app aggregator get failed over, AM or NMs (who called putEntities with timelineClient) will notify RM on this failure, RM verify the out of service for this app aggregator first and kick off rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat comes. When hear back from this new NM, RM did the same thing as the 1st case. One gap here today is we launched appAggregatorService (by NM's auxiliary service) whenever AM container get launched, no matter first time launch or rescheduled as failed before. As my early comments above - AM container failed over with rescheduled to other NM may not have to cause rebind of aggregator service just like out of service for app's aggregator may not cause AM container get killed. So I think appAggregatorService should get launched by NM automatic only in first attemp and taken care by RM in next attempts. About rack condition between NM heartbeat with posting entities, I don't think posting entities should block any major logic especially NM heartbeat. In addition, if we make TimelineClient can handle service discovery automatically, this will never happen. What do you think? bq. Sorry for not commenting earlier. Thanks for taking this up Junping Du. No worry. Thanks! bq. Not using YARN-913 is fine if it's not going to make sense. I haven't looked too closely at it either; it just sounded like it might be helpful here. Agree. My feeling now is service discovery get couple tightly with service lifecycle management. Given our app aggregator service - not inside of a dedicated container, but have many options, and its consumer include YARN components but not only AM. So I think YARN-913 may not be the best fit at this moment. [~ste...@apache.org] is the main author of YARN-913. Steve, do you have any comments here? bq. Given that a particular NM is only interested in the Applications that are running on it, is there some way to have it only receive the aggregator info for those apps? This would decrease the amount of throw away data that gets sent. In current patch, RM only send NM the aggregator lists for active Apps on this container. Please check the code in ResourceTrackerService: {code} +ConcurrentMapApplicationId, String liveAppAggregatorsMap = new +ConcurrentHashMapApplicationId, String(); +ListApplicationId keepAliveApps = remoteNodeStatus.getKeepAliveApplications(); +if (keepAliveApps != null) { + ConcurrentMapApplicationId, RMApp rmApps = rmContext.getRMApps(); + for (ApplicationId appId : keepAliveApps) { +
[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode
[ https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336828#comment-14336828 ] Hadoop QA commented on YARN-1853: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640461/YARN-1853-trunk.patch against trunk revision 5731c0e. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6737//console This message is automatically generated. Allow containers to be ran under real user even in insecure mode Key: YARN-1853 URL: https://issues.apache.org/jira/browse/YARN-1853 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.3.0 Reporter: Andrey Stepachev Attachments: YARN-1853-trunk.patch, YARN-1853.patch Currently unsecure cluster runs all containers under one user (typically nobody). That is not appropriate, because yarn applications doesn't play well with hdfs having enabled permissions. Yarn applications try to write data (as expected) into /user/nobody regardless of user, who launched application. Another sideeffect is that it is not possible to configure cgroups for particular users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode
[ https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336805#comment-14336805 ] Ravi Prakash commented on YARN-1853: This seems like a dupe of YARN-2424 . Andrey! Could you please confirm? Allow containers to be ran under real user even in insecure mode Key: YARN-1853 URL: https://issues.apache.org/jira/browse/YARN-1853 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.3.0 Reporter: Andrey Stepachev Attachments: YARN-1853-trunk.patch, YARN-1853.patch Currently unsecure cluster runs all containers under one user (typically nobody). That is not appropriate, because yarn applications doesn't play well with hdfs having enabled permissions. Yarn applications try to write data (as expected) into /user/nobody regardless of user, who launched application. Another sideeffect is that it is not possible to configure cgroups for particular users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities
[ https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336811#comment-14336811 ] Zhijie Shen commented on YARN-3240: --- Thanks for committing the patch. PS: .keep seems to be missed, I added it in a followup commit. [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: YARN-2928 Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-977) Interface for users/AM to know actual usage by the container
[ https://issues.apache.org/jira/browse/YARN-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336808#comment-14336808 ] Ravi Prakash commented on YARN-977: --- Is this related to YARN-1856? Interface for users/AM to know actual usage by the container Key: YARN-977 URL: https://issues.apache.org/jira/browse/YARN-977 Project: Hadoop YARN Issue Type: Improvement Reporter: Omkar Vinit Joshi Today we allocate resource (memory and cpu) and node manager starts the container with requested resource [I am assuming they are using cgroups]. But there is definitely a possibility of users requesting more than what they actually may need during the execution of their container/job-task. If we add a way for users/AM to know the actual usage of the requested/completed container then they may optimize it for next run.. This will be helpful for AM to optimize cpu/memory resource requests by querying NM/RM to know avg/max cpu/memory usage of the container or may be containers belonging to application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1853) Allow containers to be ran under real user even in insecure mode
[ https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336880#comment-14336880 ] Andrey Stepachev commented on YARN-1853: [~raviprak] not exactly. basically it does the same, but this patch also adds check that user actually exists, and sends reject if not. without that check RM will fail with exception and user will not know that request was failed due of misconfiguration in user/group mapping. Allow containers to be ran under real user even in insecure mode Key: YARN-1853 URL: https://issues.apache.org/jira/browse/YARN-1853 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.3.0 Reporter: Andrey Stepachev Attachments: YARN-1853-trunk.patch, YARN-1853.patch Currently unsecure cluster runs all containers under one user (typically nobody). That is not appropriate, because yarn applications doesn't play well with hdfs having enabled permissions. Yarn applications try to write data (as expected) into /user/nobody regardless of user, who launched application. Another sideeffect is that it is not possible to configure cgroups for particular users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
[ https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash resolved YARN-1943. Resolution: Duplicate Marking as dupe of YARN-2424. Please reopen if my understanding is incorrect Multitenant LinuxContainerExecutor is incompatible with Simple Security mode. - Key: YARN-1943 URL: https://issues.apache.org/jira/browse/YARN-1943 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: jay vyas Priority: Critical Labels: linux Fix For: 2.3.0 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser replaces the user who submits a job if security is disabled: {noformat} return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser; {noformat} However, the only way to enable security, is to NOT use SIMPLE authentication mode: {noformat} public static boolean isSecurityEnabled() { return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE); } {noformat} Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser for submission of LinuxExecutorContainer. This results in a confusing issue, wherein we submit a job as sally and then get an exception that user nobody is not whitelisted and has UID MAX_ID. My proposed solution is that we should be able to leverage LinuxContainerExector regardless of hadoop's view of the security settings on the cluster, i.e. decouple LinuxContainerExecutor logic from the isSecurityEnabled return value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337092#comment-14337092 ] Hadoop QA commented on YARN-1809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700829/YARN-1809.12.patch against trunk revision 5731c0e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.timeline.security.TestTimelineAuthenticationFilter org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6739//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6739//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6739//console This message is automatically generated. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: (was: YARN-3031.03.patch) [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337011#comment-14337011 ] Allen Wittenauer commented on YARN-3168: some nits: * table of contents have extra spacing (global) * RM restart doc has broken rendering for containerid * docker: job config has broken rendering * yarn secure containers: link title is broken for nested jobs Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, YARN-3168.20150225.2.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336923#comment-14336923 ] Wangda Tan commented on YARN-3197: -- There's one difference, when container id and app id is null, it is {{containerId=xx completed with status=yyy from completed or unknown application id=zzz}} And when RMContainer is not null, but cannot find app id, it should be: {{containerId=xx completed with status=yyy from completed application id=zzz}}. The 2nd one's application shouldn't be indicated as unknown Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits
Allen Wittenauer created YARN-3261: -- Summary: rewrite resourcemanager restart doc to remove roadmap bits Key: YARN-3261 URL: https://issues.apache.org/jira/browse/YARN-3261 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Another mixture of roadmap and instruction manual that seems to be ever present in a lot of the recently written documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: YARN-3031.03.patch [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337008#comment-14337008 ] Hadoop QA commented on YARN-3031: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700828/YARN-3031.03.patch against trunk revision 5731c0e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6738//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6738//console This message is automatically generated. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: YARN-3031.03.patch Uploading a new patch with Identifier instead of ApplicationId as per [~zjshen]'s review suggestion. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: YARN-3031.03.patch [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: (was: YARN-3031.03.patch) [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1809: Attachment: YARN-1809.12.patch get rid of the changes for FairSchedulerAppsBlock, let us file a separate ticket for FairScheduler changes. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3197: --- Attachment: YARN-3197.003.patch Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch, YARN-3197.003.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3039: - Summary: [Aggregator wireup] Implement ATS app-appgregator service discovery (was: [Aggregator wireup] Implement ATS writer service discovery) [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336994#comment-14336994 ] Junping Du commented on YARN-3031: -- Hi [~vrushalic], thanks for updating the patch. A quick question here: Looks like we only specify three tracks for entity: FLOW, USER and QUEUE {code} +public enum AggregateUpTo { + FLOW, + USER, + QUEUE +} {code} But we have more tracks (type) in YARN-3041. {code} +public enum TimelineEntityType { + YARN_CLUSTER, + YARN_FLOW, + YARN_APPLICATION, + YARN_APPLICATION_ATTEMPT, + YARN_CONTAINER, + YARN_USER, + YARN_QUEUE; {code} May be we should add more here? [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337144#comment-14337144 ] Vinod Kumar Vavilapalli commented on YARN-3197: --- You can infer ApplicationID from the ContainerID, so need for printing both. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch, YARN-3197.002.patch, YARN-3197.003.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337148#comment-14337148 ] Vinod Kumar Vavilapalli commented on YARN-3031: --- Quick comment: Following the proposal at YARN-3166, let's put the storage APIs in hadoop-yarn-server/hadoop-yarn-server-timelineservice under package o.a.h.yarn.server.timelineservice.storage? We shouldn't be exposing these to application writers as part of the hadoop-yarn-api records. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3254: Attachment: YARN-3254-002.patch v2 patch fixes log formatting. HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: Screen Shot 2015-02-24 at 17.57.39.png, YARN-3254-001.patch, YARN-3254-002.patch When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3254: Attachment: YARN-3254-001.patch Attaching a patch to add a new public method {{getDisksHealthReport(boolean, boolean)}}, and deprecate the existing method {{getDisksHealthReport(boolean)}} for backward compatibility. I'll attach a screen shot later. HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: Screen Shot 2015-02-24 at 17.57.39.png, YARN-3254-001.patch When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Description: It would be useful to surface the resource requests table on the application web page to facilitate scheduling analysis and debugging. (was: It would be useful to surface the resource requests table on the web page to facilitate scheduling analysis and debugging.) Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He It would be useful to surface the resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3262) Suface application resource requests table
Jian He created YARN-3262: - Summary: Suface application resource requests table Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He It would be useful to surface the resource requests table on the web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337301#comment-14337301 ] zhihai xu commented on YARN-2893: - I feel I know what cause this issue. This issue most likely in the following code: {code} Credentials credentials = new Credentials(); DataInputByteBuffer dibb = new DataInputByteBuffer(); if (container.getTokens() != null) { // TODO: Don't do this kind of checks everywhere. dibb.reset(container.getTokens()); credentials.readTokenStorageStream(dibb); } {code} we didn't rewind the token after credentials.readTokenStorageStream(dibb), I checked the code in DataInputByteBuffer. It will move the position of ByteBuffer (container.getTokens) in DataInputByteBuffer.read. (HeapByteBuffer is used for container.getTokens). {code} public int read(byte[] b, int off, int len) { if (bidx = buffers.length) { return -1; } int cur = 0; do { int rem = Math.min(len, buffers[bidx].remaining()); buffers[bidx].get(b, off, rem); cur += rem; off += rem; len -= rem; } while (len 0 ++bidx buffers.length); pos += cur; return cur; } {code} So If exception happen in AMLauncher.setupTokens before the ByteBuffer changed in container.setTokens. Then the position of ByteBuffer of Tokens will be at the end and we will see this issue next time when we retry. So it think it will be good to add container.getTokens().rewind() after credentials.readTokenStorageStream(dibb);. I will create a patch for this. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Attachment: (was: YARN-3251.trunk.1.patch) CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Attachment: YARN-3251.trunk.1.patch Attached ver.1 patch against trunk (YARN-3251.trunk.1.patch) CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337305#comment-14337305 ] Wangda Tan commented on YARN-3251: -- Found few typos, removed patch and will upload again. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337311#comment-14337311 ] Jason Lowe commented on YARN-3262: -- Duplicate of YARN-451? Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He It would be useful to surface the resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Attachment: YARN-3251.trunk.1.patch Attached ver.1 patch against trunk CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337317#comment-14337317 ] Hadoop QA commented on YARN-3254: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700863/YARN-3254-001.patch against trunk revision 5731c0e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6741//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6741//console This message is automatically generated. HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: Screen Shot 2015-02-24 at 17.57.39.png, YARN-3254-001.patch, YARN-3254-002.patch When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337319#comment-14337319 ] Jian He commented on YARN-3262: --- not exactly, this jira is to grab the AppSchedulingInfo#requests and dump it on the UI. I have a patch almost ready. attaching the screen shot Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He It would be useful to surface the resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3263) ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream
zhihai xu created YARN-3263: --- Summary: ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream Key: YARN-3263 URL: https://issues.apache.org/jira/browse/YARN-3263 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream. So the next time if we access Tokens, we will have EOFException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3263) ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3263: Description: ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream. So the next time if we access Tokens, we will have EOFException. The following is the code for parseCredentials in ContainerManagerImpl. {code} private Credentials parseCredentials(ContainerLaunchContext launchContext) throws IOException { Credentials credentials = new Credentials(); // Parse credentials ByteBuffer tokens = launchContext.getTokens(); if (tokens != null) { DataInputByteBuffer buf = new DataInputByteBuffer(); tokens.rewind(); buf.reset(tokens); credentials.readTokenStorageStream(buf); if (LOG.isDebugEnabled()) { for (Token? extends TokenIdentifier tk : credentials.getAllTokens()) { LOG.debug(tk.getService() + = + tk.toString()); } } } // End of parsing credentials return credentials; } {code} was:ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream. So the next time if we access Tokens, we will have EOFException. ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream -- Key: YARN-3263 URL: https://issues.apache.org/jira/browse/YARN-3263 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream. So the next time if we access Tokens, we will have EOFException. The following is the code for parseCredentials in ContainerManagerImpl. {code} private Credentials parseCredentials(ContainerLaunchContext launchContext) throws IOException { Credentials credentials = new Credentials(); // Parse credentials ByteBuffer tokens = launchContext.getTokens(); if (tokens != null) { DataInputByteBuffer buf = new DataInputByteBuffer(); tokens.rewind(); buf.reset(tokens); credentials.readTokenStorageStream(buf); if (LOG.isDebugEnabled()) { for (Token? extends TokenIdentifier tk : credentials.getAllTokens()) { LOG.debug(tk.getService() + = + tk.toString()); } } } // End of parsing credentials return credentials; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Attachment: resource requests.png attached a screenshot to demonstrate the idea Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He Attachments: resource requests.png It would be useful to surface the resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337348#comment-14337348 ] Vinod Kumar Vavilapalli commented on YARN-3251: --- bq. Since the target of your patch is to make a quick fix for old version, it's better to create a patch in branch-2.6. and the patch you created will be committed to branch-2.6 as well. I noticed some functionalities and interfaces being used in your patch are not part of 2.6. And patch I'm working on now will remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a intermediate fix in branch-2. How about we have a separate JIRA solely focused on 2.6.1 - as we have two separate patches and two different contributors? CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3254: Attachment: Screen Shot 2015-02-25 at 14.38.10.png Attaching a screenshot after applying v2 patch. HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Description: It would be useful to surface the outstanding resource requests table on the application web page to facilitate scheduling analysis and debugging. (was: It would be useful to surface the resource requests table on the application web page to facilitate scheduling analysis and debugging.) Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He Attachments: YARN-3262.1.patch, resource requests.png It would be useful to surface the outstanding resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337459#comment-14337459 ] Zhijie Shen commented on YARN-3125: --- Hi Junping, thanks for updating the patch. Some more comments: 1. One side-by-side change is missed around the following code. {code} if(timelineClient != null) { publishApplicationAttemptEvent(timelineClient, appAttemptID.toString(), DSEvent.DS_APP_ATTEMPT_END, domainId, appSubmitterUgi); } {code} 2. Client needs to be changed too: taking CLI option and appending it as the args of the command to run AM. [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Junping Du Attachments: YARN-3125.patch, YARN-3125v2.patch We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337670#comment-14337670 ] Vinod Kumar Vavilapalli commented on YARN-2423: --- Sorry, I misspoke. Yes, without this patch, they are the REST APIs; that is all we have. bq. Are they declared as InterfaceAudience(Public)? If they're not, we're not willing to use those in Spark, because there's no commitment from Yarn to keep them stable. In my opinion, the APIs being added in this JIRA are no more stable than the REST APIs. They will possibly change the moment our data-model via YARN-2928 changes and correspondingly the REST APIs change. So, If Spark doesn't want to code against this in a shimmed layer, I am not sure I can be of much help. FWIW, I am trying as much to maintain compatibility of what exists so far. I'm not trying to block this patch, just pointing out potential breakages in the near future given YARN-2928. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337672#comment-14337672 ] Zhijie Shen commented on YARN-1809: --- [~xgong], thanks for moving this work forward. I've tried the patch. It seems to work overall, but a couple of links are broken. The reason is that the old patch is about 1yr before, while RM and AHS webUIs have been changed to some extent. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Xuan Gong Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337701#comment-14337701 ] Hadoop QA commented on YARN-3265: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700919/YARN-3265.1.patch against trunk revision d140d76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6748//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6748//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6748//console This message is automatically generated. CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2) -- Key: YARN-3265 URL: https://issues.apache.org/jira/browse/YARN-3265 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3265.1.patch This patch is trying to solve the same problem described in YARN-3251, but this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3173) start-yarn.sh script can't aware how many RMs to be started.
[ https://issues.apache.org/jira/browse/YARN-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-3173. Resolution: Duplicate I'm going to close this as a dupe of HADOOP-11590. I've got the requested functionality of starting HA RM nodes as part of that patch already. start-yarn.sh script can't aware how many RMs to be started. - Key: YARN-3173 URL: https://issues.apache.org/jira/browse/YARN-3173 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: BOB Priority: Minor When config more than one RM, for example in the HA cluster, using the start-yarn.sh script to start yarn cluster,but the cluster only start up with one resourcemanager on the node which start-yarn.sh be executed. I think yarn should sense how many RMs been configured at the beginning, and start them all in start-yarn.sh script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337756#comment-14337756 ] Hadoop QA commented on YARN-3262: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700928/YARN-3262.2.patch against trunk revision d140d76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6750//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6750//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6750//console This message is automatically generated. Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He Attachments: YARN-3262.1.patch, YARN-3262.2.patch, resource requests.png It would be useful to surface the outstanding resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337682#comment-14337682 ] Marcelo Vanzin commented on YARN-2423: -- Hi Vinod, I think the point that Robert was trying to make is that adding these APIs might force Yarn to maintain compatibility for it. So it would allow clients to code against the public API and have a reasonable expectation that it wouldn't break. But I understand that with the redesign it might be hard to maintain compatibility. I guess it's a choice you guys have to make, but the lack of a public, stable read API is definitely a barrier for Spark adopting this feature. (I understand we could write code to talk to the REST server directly, but you seem to imply that this approach would also run into compatibility issues after the redesign.) TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337443#comment-14337443 ] Hadoop QA commented on YARN-3251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700874/YARN-3251.trunk.1.patch against trunk revision caa42ad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6743//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6743//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6743//console This message is automatically generated. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Target Version/s: 2.6.1 (was: 2.7.0, 2.6.1) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337630#comment-14337630 ] Marcelo Vanzin commented on YARN-2423: -- Hi [~vinodkv], What are these APIs? Are they declared as InterfaceAudience(Public)? If they're not, we're not willing to use those in Spark, because there's no commitment from Yarn to keep them stable. The only TimelineClient I see on branch-2.6 has no methods to read data from the ATS. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337354#comment-14337354 ] Jason Lowe commented on YARN-3131: -- bq. We can just simply check for failToSubmitStates? Why do we also need to check for waitingStates? If we only check for failToSubmitStates then we'll continue to loop waiting for those failed states. We need to check for waitingStates because that's the typical looping condition. We need to keep polling for a new application report as long as the app is in the NEW, NEW_SAVING, or SUBMITTED state since those states indicate the RM hasn't finished accepting the app yet. When it's not in one of the waiting states, we need to check if its one of the failed states to decide to throw rather than just return indicating it was successful. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3254) HealthReport should include disk full information
[ https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337376#comment-14337376 ] Akira AJISAKA commented on YARN-3254: - I reconsidered that, after all the issue is not a problem because * Admin can read NodeManager's log and find the message as follows: {code} 2015-02-26 07:34:22,485 WARN org.apache.hadoop.yarn.server.nodemanager.Directory Collection: Directory /usr/local/20150225-YARN-3254-2/logs/userlogs error, used space above threshold of 90.0%, removing from list of valid directories {code} * This patch is still incompatible as jmx information is actually changed. HealthReport should include disk full information - Key: YARN-3254 URL: https://issues.apache.org/jira/browse/YARN-3254 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that local/log dir is bad and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337470#comment-14337470 ] Chang Li commented on YARN-3131: Thanks [~jlowe] for commenting. [~vinodkv] I have fixed the space issue. Jason's explanation is reasonable and I also agree that the check for waitingStates is logical and necessary. Do you have any other concern? Thanks. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch, yarn_3131_v7.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Attachment: (was: YARN-3251.trunk.1.patch) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile
[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3080: -- Attachment: YARN-3080.patch Fixed the findbug errors. The DockerContainerExecutor could not write the right pid to container pidFile -- Key: YARN-3080 URL: https://issues.apache.org/jira/browse/YARN-3080 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Beckham007 Assignee: Abin Shahab Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch The docker_container_executor_session.sh is like this: {quote} #!/usr/bin/env bash echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1421723685222_0008_01_02` /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /bin/mv -f /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid /usr/bin/docker run --rm --name container_1421723685222_0008_01_02 -e GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M --cpu-shares=1024 -v /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02 -v /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02 -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh {quote} The DockerContainerExecutor use docker inspect before docker run, so the docker inspect couldn't get the right pid for the docker, signalContainer() and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337571#comment-14337571 ] Wangda Tan commented on YARN-3251: -- Removed patch for trunk and uploaded the same one to YARN-3265, reassigned this to [~cwelch]. CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Craig Welch Priority: Blocker Attachments: YARN-3251.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337401#comment-14337401 ] Vrushali C commented on YARN-3031: -- Hi [~djp] The AggregateUpTo enum has the tracks to aggregate along, the TimelineEntityType enum has the types of entities that can exist. There may not be aggregations along all entity types. For example: a query can be, give all the apps run by this user in the last week. This will read the data that is aggregated along the USER track. I think I can rename those with a YARN prefix though. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337465#comment-14337465 ] Hadoop QA commented on YARN-3251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700875/YARN-3251.trunk.1.patch against trunk revision caa42ad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6744//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6744//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6744//console This message is automatically generated. CapacityScheduler deadlock when computing absolute max avail capacity - Key: YARN-3251 URL: https://issues.apache.org/jira/browse/YARN-3251 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch The ResourceManager can deadlock in the CapacityScheduler when computing the absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Attachment: YARN-3262.2.patch Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He Attachments: YARN-3262.1.patch, YARN-3262.2.patch, resource requests.png It would be useful to surface the outstanding resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337464#comment-14337464 ] Hadoop QA commented on YARN-3262: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700884/YARN-3262.1.patch against trunk revision caa42ad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1151 javac compiler warnings (more than the trunk's current 208 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 47 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6746//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6746//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6746//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6746//console This message is automatically generated. Suface application resource requests table -- Key: YARN-3262 URL: https://issues.apache.org/jira/browse/YARN-3262 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Jian He Assignee: Jian He Attachments: YARN-3262.1.patch, resource requests.png It would be useful to surface the outstanding resource requests table on the application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337463#comment-14337463 ] Hadoop QA commented on YARN-3131: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700886/yarn_3131_v7.patch against trunk revision caa42ad. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6747//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6747//console This message is automatically generated. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, yarn_3131_v6.patch, yarn_3131_v7.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3265: - Priority: Blocker (was: Major) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2) -- Key: YARN-3265 URL: https://issues.apache.org/jira/browse/YARN-3265 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker This patch is trying to solve the same problem described in YARN-3251, but this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)
[ https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3265: - Attachment: YARN-3265.1.patch CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2) -- Key: YARN-3265 URL: https://issues.apache.org/jira/browse/YARN-3265 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3265.1.patch This patch is trying to solve the same problem described in YARN-3251, but this is a longer term fix for trunk and branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)