[jira] [Commented] (YARN-3558) Additional containers getting reserved from RM in case of Fair scheduler
[ https://issues.apache.org/jira/browse/YARN-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551884#comment-14551884 ] Sunil G commented on YARN-3558: --- Hi [~bibinchundatt] Could you please upload the RM logs. Additional containers getting reserved from RM in case of Fair scheduler Key: YARN-3558 URL: https://issues.apache.org/jira/browse/YARN-3558 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.7.0 Environment: OS :Suse 11 Sp3 Setup : 2 RM 2 NM Scheduler : Fair scheduler Reporter: Bibin A Chundatt Submit PI job with 16 maps Total container expected : 16 MAPS + 1 Reduce + 1 AM Total containers reserved by RM is 21 Below set of containers are not being used for execution container_1430213948957_0001_01_20 container_1430213948957_0001_01_19 RM Containers reservation and states {code} Processing container_1430213948957_0001_01_01 of type START Processing container_1430213948957_0001_01_01 of type ACQUIRED Processing container_1430213948957_0001_01_01 of type LAUNCHED Processing container_1430213948957_0001_01_02 of type START Processing container_1430213948957_0001_01_03 of type START Processing container_1430213948957_0001_01_02 of type ACQUIRED Processing container_1430213948957_0001_01_03 of type ACQUIRED Processing container_1430213948957_0001_01_04 of type START Processing container_1430213948957_0001_01_05 of type START Processing container_1430213948957_0001_01_04 of type ACQUIRED Processing container_1430213948957_0001_01_05 of type ACQUIRED Processing container_1430213948957_0001_01_02 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type LAUNCHED Processing container_1430213948957_0001_01_06 of type RESERVED Processing container_1430213948957_0001_01_03 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type START Processing container_1430213948957_0001_01_07 of type ACQUIRED Processing container_1430213948957_0001_01_07 of type LAUNCHED Processing container_1430213948957_0001_01_08 of type RESERVED Processing container_1430213948957_0001_01_02 of type FINISHED Processing container_1430213948957_0001_01_06 of type START Processing container_1430213948957_0001_01_06 of type ACQUIRED Processing container_1430213948957_0001_01_06 of type LAUNCHED Processing container_1430213948957_0001_01_04 of type FINISHED Processing container_1430213948957_0001_01_09 of type START Processing container_1430213948957_0001_01_09 of type ACQUIRED Processing container_1430213948957_0001_01_09 of type LAUNCHED Processing container_1430213948957_0001_01_10 of type RESERVED Processing container_1430213948957_0001_01_03 of type FINISHED Processing container_1430213948957_0001_01_08 of type START Processing container_1430213948957_0001_01_08 of type ACQUIRED Processing container_1430213948957_0001_01_08 of type LAUNCHED Processing container_1430213948957_0001_01_05 of type FINISHED Processing container_1430213948957_0001_01_11 of type START Processing container_1430213948957_0001_01_11 of type ACQUIRED Processing container_1430213948957_0001_01_11 of type LAUNCHED Processing container_1430213948957_0001_01_07 of type FINISHED Processing container_1430213948957_0001_01_12 of type START Processing container_1430213948957_0001_01_12 of type ACQUIRED Processing container_1430213948957_0001_01_12 of type LAUNCHED Processing container_1430213948957_0001_01_13 of type RESERVED Processing container_1430213948957_0001_01_06 of type FINISHED Processing container_1430213948957_0001_01_10 of type START Processing container_1430213948957_0001_01_10 of type ACQUIRED Processing container_1430213948957_0001_01_10 of type LAUNCHED Processing container_1430213948957_0001_01_09 of type FINISHED Processing container_1430213948957_0001_01_14 of type START Processing container_1430213948957_0001_01_14 of type ACQUIRED Processing container_1430213948957_0001_01_14 of type LAUNCHED Processing container_1430213948957_0001_01_15 of type RESERVED Processing container_1430213948957_0001_01_08 of type FINISHED Processing container_1430213948957_0001_01_13 of type START Processing container_1430213948957_0001_01_16 of type RESERVED Processing container_1430213948957_0001_01_13 of type ACQUIRED Processing container_1430213948957_0001_01_13 of type LAUNCHED Processing container_1430213948957_0001_01_11 of
[jira] [Updated] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3591: -- Attachment: YARN-3591.4.patch Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551885#comment-14551885 ] Anubhav Dhoot commented on YARN-3675: - This fixes the issue where scheduling can happen after the node has been removed. Because of this when the application is removed, its will clean up its reserved and completed containers. And at that time it will try to call a method on the FSSchedulerNode which is null. Here is the trace of the same instance as above where it shows the scheduling happening just after the node is removed. Looking at continuousSchedulingAttempt we can get the reference to the node before we take scheduler lock when calling attemptScheduling. {noformat} hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node nmhostname:8041 hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e25_1431107530707_159950_01_21 of capacity memory:2048, vCores:1 on host nmhostname:8041, which has 1 containers, memory:2048, vCores:1 used an hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,796 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Making reservation: node=nmhostname app_id=application_1431107530707_159852 {noformat} FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551895#comment-14551895 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 54s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 6m 54s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 73m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734062/YARN-3646.001.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / ce53c8e | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8017/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8017/console | This message was automatically generated. Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.001.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at
[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3675: Attachment: YARN-3675.001.patch FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551946#comment-14551946 ] Hadoop QA commented on YARN-3675: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 1 new checkstyle issues (total was 74, now 75). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 8s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734074/YARN-3675.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ce53c8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8018/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8018/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8018/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8018/console | This message was automatically generated. FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3689) FifoComparator logic is wrong. In method compare in FifoPolicy.java file, the s1 and s2 should change position when compare priority
zhoulinlin created YARN-3689: Summary: FifoComparator logic is wrong. In method compare in FifoPolicy.java file, the s1 and s2 should change position when compare priority Key: YARN-3689 URL: https://issues.apache.org/jira/browse/YARN-3689 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, scheduler Affects Versions: 2.5.0 Reporter: zhoulinlin In method compare in FifoPolicy.java file, the s1 and s2 should change position when compare priority. I did a test. Configured the schedulerpolicy fifo, submitted 2 jobs to the same queue. The result is below: 2015-05-20 11:57:41,449 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: before sort -- 2015-05-20 11:57:41,449 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432094103221_0001 appPririty:4 appStartTime:1432094170038 2015-05-20 11:57:41,449 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432094103221_0002 appPririty:2 appStartTime:1432094173131 2015-05-20 11:57:41,449 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: after sort % 2015-05-20 11:57:41,449 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432094103221_0001 appPririty:4 appStartTime:1432094170038 2015-05-20 11:57:41,449 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432094103221_0002 appPririty:2 appStartTime:1432094173131 But when change the s1 and s2 position like below: public int compare(Schedulable s1, Schedulable s2) { int res = s2.getPriority().compareTo(s1.getPriority()); .} The result: 2015-05-20 11:36:37,119 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: before sort -- 2015-05-20 11:36:37,119 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432090734333_0009 appPririty:4 appStartTime:1432092992503 2015-05-20 11:36:37,119 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432090734333_0010 appPririty:2 appStartTime:1432092996437 2015-05-20 11:36:37,119 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: after sort % 2015-05-20 11:36:37,119 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432090734333_0010 appPririty:2 appStartTime:1432092996437 2015-05-20 11:36:37,119 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: appName:application_1432090734333_0009 appPririty:4 appStartTime:1432092992503 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3543: - Attachment: (was: 0003-YARN-3543.patch) ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3690) 'mvn site' fails on JDK8
[ https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-3690: -- Assignee: Brahma Reddy Battula 'mvn site' fails on JDK8 Key: YARN-3690 URL: https://issues.apache.org/jira/browse/YARN-3690 Project: Hadoop YARN Issue Type: Bug Components: documentation Environment: CentOS 7.0, Oracle JDK 8u45. Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula 'mvn site' failed by the following error: {noformat} [ERROR] /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18: error: package org.apache.hadoop.yarn.factories has already been annotated [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN }) [ERROR] ^ [ERROR] java.lang.AssertionError [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) [ERROR] at com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161) [ERROR] at com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) [ERROR] at com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) [ERROR] at com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) [ERROR] at com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) [ERROR] javadoc: error - fatal error [ERROR] [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc -J-Xmx1024m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3690) 'mvn site' fails on JDK8
[ https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3690: Description: 'mvn site' failed by the following error: {noformat} [ERROR] /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18: error: package org.apache.hadoop.yarn.factories has already been annotated [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN }) [ERROR] ^ [ERROR] java.lang.AssertionError [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) [ERROR] at com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161) [ERROR] at com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) [ERROR] at com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) [ERROR] at com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) [ERROR] at com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) [ERROR] javadoc: error - fatal error [ERROR] [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc -J-Xmx1024m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {noformat} was: {noformat} [ERROR] /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18: error: package org.apache.hadoop.yarn.factories has already been annotated [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN }) [ERROR] ^ [ERROR] java.lang.AssertionError [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) [ERROR] at com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161) [ERROR] at com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) [ERROR] at com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) [ERROR] at com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) [ERROR] at com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) [ERROR] javadoc: error - fatal error [ERROR] [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc -J-Xmx1024m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {noformat} 'mvn site' fails on JDK8
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552181#comment-14552181 ] Hudson commented on YARN-3601: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Fix For: 2.7.1 Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552208#comment-14552208 ] Hudson commented on YARN-3565: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552209#comment-14552209 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552214#comment-14552214 ] Hudson commented on YARN-3677: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552203#comment-14552203 ] Hudson commented on YARN-3583: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch, 0004-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552204#comment-14552204 ] Hudson commented on YARN-3601: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Fix For: 2.7.1 Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552212#comment-14552212 ] Hudson commented on YARN-3302: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #933 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/933/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, YARN-3302-trunk.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti updated YARN-3646: --- Attachment: YARN-3646.002.patch [~rohithsharma] Thanks for the review and comments. Attached a new patch Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552225#comment-14552225 ] Rohith commented on YARN-3646: -- +1 lgtm (non-binding).. wait for jenkins report!! Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552165#comment-14552165 ] Rohith commented on YARN-3543: -- Build machine is not able to run all those test at one shot. Similar issue had faced earlier in YARN-2784. I think need to split the JIRA into proto change, WebUI change, AH change and more. ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3690) 'mvn site' fails on JDK8
Akira AJISAKA created YARN-3690: --- Summary: 'mvn site' fails on JDK8 Key: YARN-3690 URL: https://issues.apache.org/jira/browse/YARN-3690 Project: Hadoop YARN Issue Type: Bug Components: documentation Environment: CentOS 7.0, Oracle JDK 8u45. Reporter: Akira AJISAKA {noformat} [ERROR] /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18: error: package org.apache.hadoop.yarn.factories has already been annotated [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN }) [ERROR] ^ [ERROR] java.lang.AssertionError [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) [ERROR] at com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161) [ERROR] at com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) [ERROR] at com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) [ERROR] at com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) [ERROR] at com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) [ERROR] javadoc: error - fatal error [ERROR] [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc -J-Xmx1024m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3344: -- Attachment: YARN-3344-trunk.004.patch updated patch with formatting issue fixed procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Attachments: YARN-3344-branch-trunk.001.patch, YARN-3344-branch-trunk.002.patch, YARN-3344-branch-trunk.003.patch, YARN-3344-trunk.004.patch Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3690) 'mvn site' fails on JDK8
[ https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552145#comment-14552145 ] Akira AJISAKA commented on YARN-3690: - The problems is: * There are 2 package-info.java for org.apache.hadoop.yarn.factories. One is in hadoop-yarn-common and the other is in hadoop-yarn-api. * Both of the two packages are annotated. 'mvn site' fails on JDK8 Key: YARN-3690 URL: https://issues.apache.org/jira/browse/YARN-3690 Project: Hadoop YARN Issue Type: Bug Components: documentation Environment: CentOS 7.0, Oracle JDK 8u45. Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula 'mvn site' failed by the following error: {noformat} [ERROR] /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18: error: package org.apache.hadoop.yarn.factories has already been annotated [ERROR] @InterfaceAudience.LimitedPrivate({ MapReduce, YARN }) [ERROR] ^ [ERROR] java.lang.AssertionError [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126) [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45) [ERROR] at com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161) [ERROR] at com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215) [ERROR] at com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952) [ERROR] at com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64) [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876) [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143) [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129) [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512) [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471) [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78) [ERROR] at com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186) [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219) [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205) [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64) [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54) [ERROR] javadoc: error - fatal error [ERROR] [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc -J-Xmx1024m @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/home/aajisaka/git/hadoop/target/site/hadoop-project/api' dir. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552195#comment-14552195 ] Hadoop QA commented on YARN-3344: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 57s | The applied patch generated 2 new checkstyle issues (total was 43, now 42). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 4s | Tests passed in hadoop-yarn-common. | | | | 39m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734110/YARN-3344-trunk.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8022/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8022/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8022/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8022/console | This message was automatically generated. procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Attachments: YARN-3344-branch-trunk.001.patch, YARN-3344-branch-trunk.002.patch, YARN-3344-branch-trunk.003.patch, YARN-3344-trunk.004.patch Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552125#comment-14552125 ] Hadoop QA commented on YARN-3543: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 14 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 16s | The applied patch generated 1 new checkstyle issues (total was 14, now 14). | | {color:green}+1{color} | whitespace | 0m 11s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 9s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 116m 37s | Tests failed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 37s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 3s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 28s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 0m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 171m 57s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.api.impl.TestAHSClient | | | hadoop.yarn.client.TestApplicationClientProtocolOnHA | | | hadoop.yarn.client.cli.TestYarnCLI | | | hadoop.yarn.client.api.impl.TestYarnClient | | Timed out tests | org.apache.hadoop.mapreduce.TestMRJobClient | | | org.apache.hadoop.mapreduce.TestMapReduceLazyOutput | | Failed build | hadoop-yarn-server-applicationhistoryservice | | | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734085/0004-YARN-3543.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ce53c8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8021/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8021/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8021/console | This message was automatically generated. ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter:
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552186#comment-14552186 ] Hudson commented on YARN-2821: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/CHANGES.txt Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552190#comment-14552190 ] Hudson commented on YARN-3302: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, YARN-3302-trunk.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552192#comment-14552192 ] Hudson commented on YARN-3677: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552185#comment-14552185 ] Hudson commented on YARN-3565: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552180#comment-14552180 ] Hudson commented on YARN-3583: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/202/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch, 0004-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552058#comment-14552058 ] Hadoop QA commented on YARN-2336: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 57s | Site still builds. | | {color:red}-1{color} | checkstyle | 0m 45s | The applied patch generated 1 new checkstyle issues (total was 8, now 8). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 49m 57s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 92m 12s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734018/YARN-2336.009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / ce53c8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8020/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8020/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8020/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8020/console | This message was automatically generated. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.009.patch, YARN-2336.009.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3543: - Attachment: 0004-YARN-3543.patch Attached same patch to kick off Jenkins ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551962#comment-14551962 ] Akira AJISAKA commented on YARN-2336: - The test failure looks unrelated to the patch. Kicked https://builds.apache.org/job/PreCommit-YARN-Build/8020/ Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.009.patch, YARN-2336.009.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551976#comment-14551976 ] Hadoop QA commented on YARN-3591: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 20s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 28s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 42m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734083/YARN-3591.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ce53c8e | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8019/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8019/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8019/console | This message was automatically generated. Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552091#comment-14552091 ] Rohith commented on YARN-3646: -- Thanks for updating the patch, some comments on tests # I think we can remove the tests added in the hadoop-common project, since yarn-client verifies required funcitionality. And basically hadoop-common test was mocking the RMProxy functionality which test was passing without RMProxy fix also. # code never reach {{Assert.fail();}}. better to remove it # Catch the ApplicationNotFoundException instead of catching throwable. I think you can add {{expected = ApplicationNotFoundException.class}} in the @Test annotation like below. {code} @Test(timeout = 3, expected = ApplicationNotFoundException.class) public void testClientWithRetryPolicyForEver() throws Exception { YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); ResourceManager rm = null; YarnClient yarnClient = null; try { // start rm rm = new ResourceManager(); rm.init(conf); rm.start(); yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); // create invalid application id ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); // RM should throw ApplicationNotFoundException exception yarnClient.getApplicationReport(appId); } finally { if (yarnClient != null) { yarnClient.stop(); } if (rm != null) { rm.stop(); } } } {code} # can you rename the test name with actual functionality test, like {{testShouldNotRetryForeverForNonNetworkExceptions}} Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.001.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3344: -- Attachment: (was: YARN-3344-branch-trunk.001.patch) procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Attachments: YARN-3344-branch-trunk.003.patch, YARN-3344-trunk.004.patch Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3344: -- Attachment: YARN-3344-trunk.005.patch updated patch with checkstyle issue handled procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Attachments: YARN-3344-trunk.005.patch Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3344: -- Attachment: (was: YARN-3344-trunk.004.patch) procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3344: -- Attachment: (was: YARN-3344-branch-trunk.003.patch) procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3344: -- Attachment: (was: YARN-3344-branch-trunk.002.patch) procfs stat file is not in the expected format warning -- Key: YARN-3344 URL: https://issues.apache.org/jira/browse/YARN-3344 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jon Bringhurst Assignee: Ravindra Kumar Naik Attachments: YARN-3344-branch-trunk.003.patch, YARN-3344-trunk.004.patch Although this doesn't appear to be causing any functional issues, it is spamming our log files quite a bit. :) It appears that the regex in ProcfsBasedProcessTree doesn't work for all /proc/pid/stat files. Here's the error I'm seeing: {noformat} source_host: asdf, method: constructProcessInfo, level: WARN, message: Unexpected: procfs stat file is not in the expected format for process with pid 6953 file: ProcfsBasedProcessTree.java, line_number: 514, class: org.apache.hadoop.yarn.util.ProcfsBasedProcessTree, {noformat} And here's the basic info on process with pid 6953: {noformat} [asdf ~]$ cat /proc/6953/stat 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 2 18446744073709551615 0 0 17 13 0 0 0 0 0 [asdf ~]$ ps aux|grep 6953 root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 /export/apps/salt/minion-scripts/module-sync.py jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 [asdf ~]$ {noformat} This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552375#comment-14552375 ] MENG DING commented on YARN-1902: - I have been experimenting with the idea of changing AppSchedulingInfo to maintain a total request table, a fulfilled allocation table, and then calculate the difference of the two tables as the real outstanding request table used for scheduling. All is fine until I realized that this cannot handle one use case where a AMRMClient, right before sending the allocation heartbeat, removes all container requests, and add new container requests at the same priority and location (possibly with different resource capability). AppSchedulingInfo does not know about this, and may not treat the newly added container requests as outstanding requests. I agree that currently I do not see a clean solution without affecting backward compatibility. Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Assignee: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552557#comment-14552557 ] Hudson commented on YARN-3302: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, YARN-3302-trunk.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552559#comment-14552559 ] Hudson commented on YARN-3677: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552548#comment-14552548 ] Hudson commented on YARN-3583: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch, 0004-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552553#comment-14552553 ] Hudson commented on YARN-3565: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552596#comment-14552596 ] Hudson commented on YARN-3565: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552597#comment-14552597 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552601#comment-14552601 ] Hudson commented on YARN-3302: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, YARN-3302-trunk.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552591#comment-14552591 ] Hudson commented on YARN-3583: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch, 0004-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552603#comment-14552603 ] Hudson commented on YARN-3677: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552686#comment-14552686 ] Craig Welch commented on YARN-3626: --- Checkstyle looks insignificant. [~cnauroth], [~vinodkv], I've changed the approach to use the environment instead of configuration as suggested, can one of you review pls? On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552549#comment-14552549 ] Hudson commented on YARN-3601: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Fix For: 2.7.1 Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552592#comment-14552592 ] Hudson commented on YARN-3601: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2149 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2149/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Fix For: 2.7.1 Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552534#comment-14552534 ] Varun Saxena commented on YARN-3051: Well, I am still stuck on trying to get the attribute set via HttpServer2#setAttribute in WebServices class. Will update patch once that is done. [Storage abstraction] Create backing storage read interface for ATS readers --- Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552554#comment-14552554 ] Hudson commented on YARN-2821: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/201/]) YARN-2821. Fixed a problem that DistributedShell AM may hang if restarted. Contributed by Varun Vasudev (jianhe: rev 7438966586f1896ab3e8b067d47a4af28a894106) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java * hadoop-yarn-project/CHANGES.txt Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1
[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations
[ https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552680#comment-14552680 ] Chris Nauroth commented on YARN-3685: - [~vinodkv], thanks for the notification. I was not aware of this design goal at the time of YARN-316. Perhaps it's possible to move the classpath jar generation to the MR client or AM. It's not immediately obvious to me which of those 2 choices is better. We'd need to change the manifest to use relative paths in the Class-Path attribute instead of absolute paths. (The client and AM are not aware of the exact layout of the NodeManager's {{yarn.nodemanager.local-dirs}}, so the client can't predict the absolute paths at time of container launch.) There is one piece of logic that I don't see how to handle though. Some classpath entries are defined in terms of environment variables. These environment variables are expanded at the NodeManager via the container launch scripts. This was true of Linux even before YARN-316, so in that sense, YARN did already have some classpath logic indirectly. Environment variables cannot be used inside a manifest's Class-Path, so for Windows, NodeManager expands the environment variables before populating Class-Path. It would be incorrect to do the environment variable expansion at the MR client, because it might be running with different configuration than the NodeManager. I suppose if the AM did the expansion, then that would work in most cases, but it creates an assumption that the AM container is running with configuration that matches all NodeManagers in the cluster. I don't believe that assumption exists today. If we do move classpath handling out of the NodeManager, then it would be a backwards-incompatible change, and so it could not be shipped in the 2.x release line. NodeManager unnecessarily knows about classpath-jars due to Windows limitations --- Key: YARN-3685 URL: https://issues.apache.org/jira/browse/YARN-3685 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Found this while looking at cleaning up ContainerExecutor via YARN-3648, making it a sub-task. YARN *should not* know about classpaths. Our original design modeled around this. But when we added windows suppport, due to classpath issues, we ended up breaking this abstraction via YARN-316. We should clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3686: -- Attachment: 0002-YARN-3686.patch Uploading another patch covering a negative scenario. CapacityScheduler should trim default_node_label_expression --- Key: YARN-3686 URL: https://issues.apache.org/jira/browse/YARN-3686 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552699#comment-14552699 ] Anubhav Dhoot commented on YARN-3467: - Attaching the ApplicationAttempt page. It does show the number of running containers. But it does not show actual allocated resources overall for the application attempt. Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI --- Key: YARN-3467 URL: https://issues.apache.org/jira/browse/YARN-3467 Project: Hadoop YARN Issue Type: New Feature Components: webapp, yarn Affects Versions: 2.5.0 Reporter: Anthony Rojas Assignee: Anubhav Dhoot Priority: Minor Attachments: ApplicationAttemptPage.png The YARN REST API can report on the following properties: *allocatedMB*: The sum of memory in MB allocated to the application's running containers *allocatedVCores*: The sum of virtual cores allocated to the application's running containers *runningContainers*: The number of containers currently running for the application Currently, the RM Web UI does not report on these items (at least I couldn't find any entries within the Web UI). It would be useful for YARN Application and Resource troubleshooting to have these properties and their corresponding values exposed on the RM WebUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552700#comment-14552700 ] Craig Welch commented on YARN-3681: --- [~varun_saxena] the patch you had doesn't apply properly for me, I've uploaded a patch which does the same things which does, and which I've had the opportunity to test. @xgong, can you take a look at this one (.0.patch)? Thanks. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552711#comment-14552711 ] Wangda Tan commented on YARN-3686: -- [~sunilg], thanks for working on this, comments: - I think you can try to add to {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeNodeLabelExpressionInRequest(ResourceRequest, QueueInfo)}}, which needs trim node-label-expression as well - Actually this is a regression, in 2.6 queue's node label expression with spaces can setup without any issue. It's better to add test to make sure 1. spaces in resource request will be trimmed 2. spaces in queue configuration (default-node-label-expression) will be trimmed. CapacityScheduler should trim default_node_label_expression --- Key: YARN-3686 URL: https://issues.apache.org/jira/browse/YARN-3686 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552669#comment-14552669 ] Anubhav Dhoot commented on YARN-2005: - Assigning to myself to as I am starting work on this. [~sunilg] let me know if you have made progress on this already. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2005: --- Assignee: Anubhav Dhoot Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3467: Attachment: ApplicationAttemptPage.png Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI --- Key: YARN-3467 URL: https://issues.apache.org/jira/browse/YARN-3467 Project: Hadoop YARN Issue Type: New Feature Components: webapp, yarn Affects Versions: 2.5.0 Reporter: Anthony Rojas Assignee: Anubhav Dhoot Priority: Minor Attachments: ApplicationAttemptPage.png The YARN REST API can report on the following properties: *allocatedMB*: The sum of memory in MB allocated to the application's running containers *allocatedVCores*: The sum of virtual cores allocated to the application's running containers *runningContainers*: The number of containers currently running for the application Currently, the RM Web UI does not report on these items (at least I couldn't find any entries within the Web UI). It would be useful for YARN Application and Resource troubleshooting to have these properties and their corresponding values exposed on the RM WebUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3691) Limit number of reservations for an app
Arun Suresh created YARN-3691: - Summary: Limit number of reservations for an app Key: YARN-3691 URL: https://issues.apache.org/jira/browse/YARN-3691 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Currently, It is possible to reserve resource for an app on all nodes. Limiting this to possibly just a number of nodes (or a ratio of the total cluster size) would improve utilization of the cluster and will reduce the possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552710#comment-14552710 ] Hadoop QA commented on YARN-3681: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734163/YARN-3681.0.patch | | Optional Tests | | | git revision | trunk / 4aa730c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8026/console | This message was automatically generated. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3691) Limit number of reservations for an app
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-3691: - Assignee: Arun Suresh Limit number of reservations for an app --- Key: YARN-3691 URL: https://issues.apache.org/jira/browse/YARN-3691 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh Currently, It is possible to reserve resource for an app on all nodes. Limiting this to possibly just a number of nodes (or a ratio of the total cluster size) would improve utilization of the cluster and will reduce the possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object
[ https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552619#comment-14552619 ] Sunil G commented on YARN-3647: --- Test case failure and findbugs error are not related to this patch. RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object --- Key: YARN-3647 URL: https://issues.apache.org/jira/browse/YARN-3647 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch After YARN-3579, RMWebServices apis can use the updated version of apis in CommonNodeLabelsManager which gives full NodeLabel object instead of creating NodeLabel object from plain label name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3675: Attachment: YARN-3675.002.patch Fixed checkstyle issue FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch, YARN-3675.002.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.0.patch yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3691) FairScheduler: Limit number of reservations for a container
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553001#comment-14553001 ] Karthik Kambatla commented on YARN-3691: The number of reservations should be per component and not per application? If an app is looking to get resources for 10 containers, it should be able to make reservations independently for each container. FairScheduler: Limit number of reservations for a container --- Key: YARN-3691 URL: https://issues.apache.org/jira/browse/YARN-3691 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh Currently, It is possible to reserve resource for an app on all nodes. Limiting this to possibly just a number of nodes (or a ratio of the total cluster size) would improve utilization of the cluster and will reduce the possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553158#comment-14553158 ] Hudson commented on YARN-2918: -- FAILURE: Integrated in Hadoop-trunk-Commit #7875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7875/]) Move YARN-2918 from 2.8.0 to 2.7.1 (wangda: rev 03f897fd1a3779251023bae358207069b89addbf) * hadoop-yarn-project/CHANGES.txt Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Fix For: 2.8.0, 2.7.1 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411-YARN-2928.007.patch Uploading YARN-3411-YARN-2928.007.patch. I think I have addressed everyone's comments. I have been going up and down scrolling on this jira page since yesterday and I hope I have not missed out on any comment. [~gtCarrera9] I have not yet moved the test data into TestTimelineWriterImpl since it has almost a similar information setup for timeline entity but with more cases. I can modify it later. I have tested the HBase writer with Sangjin's driver code as well. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553186#comment-14553186 ] Li Lu commented on YARN-3411: - Hi [~vrushalic], sure, don't worry about the test code clean up for now. I'll try it locally. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553015#comment-14553015 ] Karthik Kambatla commented on YARN-314: --- I am essentially proposing an efficient way to index the pending requests across multiple axes. Each of these indices are captured by a map. The only reason to colocate them is to not disperse this indexing (mapping) logic across multiple classes. We should able to quickly look up all requests for an app for reporting etc., and also look up all node-local requests across applications at schedule time without having to iterate through all the applications. The maps could be - App, Priority, Locality, ResourceRequest, Locality (node/rack), Priority, App, ResourceRequest. Current {{AppSchedulingInfo}} could stay as is and use the former map to get the corresponding requests. Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Attachments: yarn-314-prelim.patch Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2556: --- Attachment: YARN-2556.10.patch Add JobHistoryFileReplayMapper mapper Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553165#comment-14553165 ] Nathan Roberts commented on YARN-3388: -- Thanks [~leftnoteasy] for the comments. I agree 2b is the way to go. I will upload a new patch soon. Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit - Key: YARN-3388 URL: https://issues.apache.org/jira/browse/YARN-3388 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, YARN-3388-v2.patch When there are multiple active users in a queue, it should be possible for those users to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed among the active users in the queue. This works pretty well when there is a single resource being scheduled. However, when there are multiple resources the situation gets more complex and the current algorithm tends to get stuck at Capacity. Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553000#comment-14553000 ] Hadoop QA commented on YARN-3686: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 20s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734160/0002-YARN-3686.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8029/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8029/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8029/console | This message was automatically generated. CapacityScheduler should trim default_node_label_expression --- Key: YARN-3686 URL: https://issues.apache.org/jira/browse/YARN-3686 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3686.patch, 0002-YARN-3686.patch We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.branch-2.0.patch Here is one for branch-2 yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3691) FairScheduler: Limit number of reservations for a container
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3691: --- Summary: FairScheduler: Limit number of reservations for a container (was: Limit number of reservations for an app) FairScheduler: Limit number of reservations for a container --- Key: YARN-3691 URL: https://issues.apache.org/jira/browse/YARN-3691 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh Currently, It is possible to reserve resource for an app on all nodes. Limiting this to possibly just a number of nodes (or a ratio of the total cluster size) would improve utilization of the cluster and will reduce the possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3691) FairScheduler: Limit number of reservations for a container
[ https://issues.apache.org/jira/browse/YARN-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553001#comment-14553001 ] Karthik Kambatla edited comment on YARN-3691 at 5/20/15 8:09 PM: - The number of reservations should be per container and not per application? If an app is looking to get resources for 10 containers, it should be able to make reservations independently for each container. was (Author: kasha): The number of reservations should be per component and not per application? If an app is looking to get resources for 10 containers, it should be able to make reservations independently for each container. FairScheduler: Limit number of reservations for a container --- Key: YARN-3691 URL: https://issues.apache.org/jira/browse/YARN-3691 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh Currently, It is possible to reserve resource for an app on all nodes. Limiting this to possibly just a number of nodes (or a ratio of the total cluster size) would improve utilization of the cluster and will reduce the possibility of starving other apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553189#comment-14553189 ] Xuan Gong commented on YARN-3681: - Committed into trunk/branch-2/branch-2.7. Thanks, craig and varun yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Fix For: 2.7.1 Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2918: - Fix Version/s: 2.7.1 Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Fix For: 2.8.0, 2.7.1 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553086#comment-14553086 ] Wangda Tan commented on YARN-2918: -- Back-ported this patch to 2.7.1, updating fix version. Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Assignee: Wangda Tan Fix For: 2.8.0, 2.7.1 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553159#comment-14553159 ] Hudson commented on YARN-3681: -- FAILURE: Integrated in Hadoop-trunk-Commit #7875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7875/]) YARN-3681. yarn cmd says could not find main class 'queue' in windows. (xgong: rev 5774f6b1e577ee64bde8c7c1e39f404b9e651176) * hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * hadoop-yarn-project/CHANGES.txt yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553114#comment-14553114 ] Xuan Gong commented on YARN-3681: - Use git apply -p0 --whitespace=fix could apply the patch. The patch looks good to me. +1 will commit yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553151#comment-14553151 ] Hadoop QA commented on YARN-3675: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 4s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 86m 17s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734207/YARN-3675.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8030/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8030/console | This message was automatically generated. FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552736#comment-14552736 ] Hadoop QA commented on YARN-3681: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734165/YARN-3681.1.patch | | Optional Tests | | | git revision | trunk / 4aa730c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8027/console | This message was automatically generated. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3692) Allow REST API to set a user generated message when killing an application
Rajat Jain created YARN-3692: Summary: Allow REST API to set a user generated message when killing an application Key: YARN-3692 URL: https://issues.apache.org/jira/browse/YARN-3692 Project: Hadoop YARN Issue Type: Improvement Reporter: Rajat Jain Currently YARN's REST API supports killing an application without setting a diagnostic message. It would be good to provide that support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3675: Attachment: YARN-3675.003.patch Removed spurious changes and changed visibility of attemptScheduling FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
[ https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552958#comment-14552958 ] Hadoop QA commented on YARN-2355: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 50m 1s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734179/YARN-2355.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8028/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8028/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8028/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8028/console | This message was automatically generated. MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container -- Key: YARN-2355 URL: https://issues.apache.org/jira/browse/YARN-2355 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Darrell Taylor Labels: newbie Attachments: YARN-2355.001.patch After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3467) Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552959#comment-14552959 ] Karthik Kambatla commented on YARN-3467: We should add this information to ApplicationAttempt page, and also preferably to the RM Web UI. I have heard asks for both number of containers and allocated resources on the RM applications page, so people can sort applications by that. Expose allocatedMB, allocatedVCores, and runningContainers metrics on running Applications in RM Web UI --- Key: YARN-3467 URL: https://issues.apache.org/jira/browse/YARN-3467 Project: Hadoop YARN Issue Type: New Feature Components: webapp, yarn Affects Versions: 2.5.0 Reporter: Anthony Rojas Assignee: Anubhav Dhoot Priority: Minor Attachments: ApplicationAttemptPage.png The YARN REST API can report on the following properties: *allocatedMB*: The sum of memory in MB allocated to the application's running containers *allocatedVCores*: The sum of virtual cores allocated to the application's running containers *runningContainers*: The number of containers currently running for the application Currently, the RM Web UI does not report on these items (at least I couldn't find any entries within the Web UI). It would be useful for YARN Application and Resource troubleshooting to have these properties and their corresponding values exposed on the RM WebUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.1.patch Oh the irony, neither did my own. Updated to one which does. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552841#comment-14552841 ] Hadoop QA commented on YARN-3675: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 51s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 87m 29s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734156/YARN-3675.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4aa730c | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8025/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8025/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8025/console | This message was automatically generated. FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3675.001.patch, YARN-3675.002.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552738#comment-14552738 ] Varun Saxena commented on YARN-3681: [~cwelch], it has to do with line endings. I have to run {{unix2dos}} to convert line endings for Jenkins to accept it. Windows batch files patches do not always apply depending on settings of line endings done by the user. I think my patch did not apply for you because of that reason. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
[ https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrell Taylor updated YARN-2355: - Attachment: YARN-2355.001.patch MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container -- Key: YARN-2355 URL: https://issues.apache.org/jira/browse/YARN-2355 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Darrell Taylor Labels: newbie Attachments: YARN-2355.001.patch After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552800#comment-14552800 ] Wangda Tan commented on YARN-314: - [~kasha], Actually I'm not quite sure about this proposal, what's the benefit of putting all apps' requests together comparing to hold one data structure per app, is there any use case? Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Attachments: yarn-314-prelim.patch Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552849#comment-14552849 ] Renan DelValle commented on YARN-2408: -- [~leftnoteasy], thanks for taking a look at the patch, really appreciate it. 1) I agree, the original patch I had was very verbose so I shrunk down the amount of data being transferred by clustering resource requests together. Seems to be the best alternative to keeping original ResourceRequest structures. 2) I will take a look at that and implement it that way. (Thank you for pointing me in the right direction). On the resource-by-label inclusion, do you think it would be better to wait until it is patched into the trunk in order to make the process easier? Resource Request REST API for YARN -- Key: YARN-2408 URL: https://issues.apache.org/jira/browse/YARN-2408 Project: Hadoop YARN Issue Type: New Feature Components: webapp Reporter: Renan DelValle Labels: features Attachments: YARN-2408-6.patch I’m proposing a new REST API for YARN which exposes a snapshot of the Resource Requests that exist inside of the Scheduler. My motivation behind this new feature is to allow external software to monitor the amount of resources being requested to gain more insightful information into cluster usage than is already provided. The API can also be used by external software to detect a starved application and alert the appropriate users and/or sys admin so that the problem may be remedied. Here is the proposed API (a JSON counterpart is also available): {code:xml} resourceRequests MB7680/MB VCores7/VCores appMaster applicationIdapplication_1412191664217_0001/applicationId applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId queueNamedefault/queueName totalMB6144/totalMB totalVCores6/totalVCores numResourceRequests3/numResourceRequests requests request MB1024/MB VCores1/VCores numContainers6/numContainers relaxLocalitytrue/relaxLocality priority20/priority resourceNames resourceNamelocalMachine/resourceName resourceName/default-rack/resourceName resourceName*/resourceName /resourceNames /request /requests /appMaster appMaster ... /appMaster /resourceRequests {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3051: --- Attachment: YARN-3051-YARN-2928.03.patch [Storage abstraction] Create backing storage read interface for ATS readers --- Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.
[ https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553377#comment-14553377 ] Hadoop QA commented on YARN-3609: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734279/YARN-3609.3.branch-2.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8966d42 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8035/console | This message was automatically generated. Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case. Key: YARN-3609 URL: https://issues.apache.org/jira/browse/YARN-3609 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, YARN-3609.3.branch-2.7.patch, YARN-3609.3.patch Now RMNodeLabelsManager loads label when serviceInit, but RMActiveService.start() is called when RM HA transition happens. We haven't done this before because queue's initialization happens in serviceInit as well, we need make sure labels added to system before init queue, after YARN-2918, we should be able to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553284#comment-14553284 ] Hadoop QA commented on YARN-2556: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 9m 47s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 31s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 19s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 2m 2s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 99m 47s | Tests failed in hadoop-mapreduce-client-jobclient. | | | | 120m 54s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.mapred.TestMerge | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734234/YARN-2556.10.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 03f897f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8031/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8031/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8031/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8031/console | This message was automatically generated. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Labels: BB2015-05-TBR Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)