[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1201: - Attachment: YARN-1201.patch Handle wrapped exception case > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987525#comment-13987525 ] Hadoop QA commented on YARN-1201: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643011/YARN-1201.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3678//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3678//console This message is automatically generated. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987553#comment-13987553 ] Rohith commented on YARN-1963: -- Added to Sunil thoughts, priority of jobs can also be displayed at RM web UI.!! > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987590#comment-13987590 ] Rohith commented on YARN-2010: -- For completed applications before starting in secured mode, clientTokenMaterKey is null. After starting in secured mode, recovery of apps fails since clientTokenMasterKey is null. During recovering application, rm should have intellegence to decide whether recovering applicaiton has run in secured mode or non secured mode. This is possible by checking cilentTokenMasterKey for null. > RM can't transition to active if it can't recover an app attempt > > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong > > If the RM fails to recover an app attempt, it won't come up. We should make > it more resilient. > Specifically, the underlying error is that the app was submitted before > Kerberos security got turned on. Makes sense for the app to fail in this > case. But YARN should still start. > {noformat} > 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to > Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) > > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) > > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274) > > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) > > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: > java.lang.IllegalArgumentException: Missing argument > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842) > > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265) > > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: > java.lang.IllegalArgumentException: Missing argument > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 8 more > Caused by: java.lang.IllegalArgumentException: Missing argument > at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) > at > org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188) > > at > org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369) > > ... 13 more > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2010) RM can't transition to active if it can't recover an app attempt
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2010: Assignee: Rohith > RM can't transition to active if it can't recover an app attempt > > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Rohith > > If the RM fails to recover an app attempt, it won't come up. We should make > it more resilient. > Specifically, the underlying error is that the app was submitted before > Kerberos security got turned on. Makes sense for the app to fail in this > case. But YARN should still start. > {noformat} > 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to > Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) > > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) > > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274) > > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) > > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: > java.lang.IllegalArgumentException: Missing argument > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842) > > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265) > > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: > java.lang.IllegalArgumentException: Missing argument > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 8 more > Caused by: java.lang.IllegalArgumentException: Missing argument > at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) > at > org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188) > > at > org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369) > > ... 13 more > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987597#comment-13987597 ] Hudson commented on YARN-1696: -- FAILURE: Integrated in Hadoop-Yarn-trunk #557 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/557/]) YARN-1696. Added documentation for ResourceManager fail-over. Contributed by Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1591416) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png > Document RM HA > -- > > Key: YARN-1696 > URL: https://issues.apache.org/jira/browse/YARN-1696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Fix For: 2.4.1 > > Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, > YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, > yarn-1696-1.patch > > > Add documentation for RM HA. Marking this a blocker for 2.4 as this is > required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2010: - Attachment: YARN-2010.patch Uploading patch without test written. Thinking of how to write test, should complete flow need to consider or only RMAppAttempt.recoveryApplication() can be called.?!! > RM can't transition to active if it can't recover an app attempt > > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Rohith > Attachments: YARN-2010.patch > > > If the RM fails to recover an app attempt, it won't come up. We should make > it more resilient. > Specifically, the underlying error is that the app was submitted before > Kerberos security got turned on. Makes sense for the app to fail in this > case. But YARN should still start. > {noformat} > 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to > Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) > > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) > > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274) > > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) > > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: > java.lang.IllegalArgumentException: Missing argument > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842) > > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265) > > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: > java.lang.IllegalArgumentException: Missing argument > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 8 more > Caused by: java.lang.IllegalArgumentException: Missing argument > at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) > at > org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188) > > at > org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369) > > ... 13 more > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987695#comment-13987695 ] Hudson commented on YARN-1696: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1774 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1774/]) YARN-1696. Added documentation for ResourceManager fail-over. Contributed by Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1591416) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png > Document RM HA > -- > > Key: YARN-1696 > URL: https://issues.apache.org/jira/browse/YARN-1696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Fix For: 2.4.1 > > Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, > YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, > yarn-1696-1.patch > > > Add documentation for RM HA. Marking this a blocker for 2.4 as this is > required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987690#comment-13987690 ] Hudson commented on YARN-1696: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1748 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1748/]) YARN-1696. Added documentation for ResourceManager fail-over. Contributed by Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1591416) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png > Document RM HA > -- > > Key: YARN-1696 > URL: https://issues.apache.org/jira/browse/YARN-1696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Fix For: 2.4.1 > > Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, > YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, > yarn-1696-1.patch > > > Add documentation for RM HA. Marking this a blocker for 2.4 as this is > required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM
[ https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987760#comment-13987760 ] Tsuyoshi OZAWA commented on YARN-2000: -- Hi [~jianhe], do you mind finishing YARN-1474? These JIRAs can be conflicted. > Fix ordering of starting services inside the RM > --- > > Key: YARN-2000 > URL: https://issues.apache.org/jira/browse/YARN-2000 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > The order of starting services in RM would be: > - Recovery of the app/attempts > - Start the scheduler and add scheduler app/attempts > - Start ResourceTrackerService and re-populate the containers in scheduler > based on the containers info from NMs > - ApplicationMasterService either don’t start or start but block until all > the previous NMs registers. > Other than these, there are other services like ClientRMService, Webapps > which we need to think about the order too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM
[ https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987764#comment-13987764 ] Tsuyoshi OZAWA commented on YARN-2000: -- typoed: s/do you mind finishing/do you mind if you wait for finishing/ > Fix ordering of starting services inside the RM > --- > > Key: YARN-2000 > URL: https://issues.apache.org/jira/browse/YARN-2000 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > The order of starting services in RM would be: > - Recovery of the app/attempts > - Start the scheduler and add scheduler app/attempts > - Start ResourceTrackerService and re-populate the containers in scheduler > based on the containers info from NMs > - ApplicationMasterService either don’t start or start but block until all > the previous NMs registers. > Other than these, there are other services like ClientRMService, Webapps > which we need to think about the order too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file
[ https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1874: - Issue Type: Improvement (was: Bug) > Cleanup: Move RMActiveServices out of ResourceManager into its own file > --- > > Key: YARN-1874 > URL: https://issues.apache.org/jira/browse/YARN-1874 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1874.1.patch > > > As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We > should move RMActiveServices out to make it more manageable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file
[ https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987767#comment-13987767 ] Tsuyoshi OZAWA commented on YARN-1874: -- I'll fix the patch to pass tests. > Cleanup: Move RMActiveServices out of ResourceManager into its own file > --- > > Key: YARN-1874 > URL: https://issues.apache.org/jira/browse/YARN-1874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1874.1.patch > > > As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We > should move RMActiveServices out to make it more manageable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2011) Typo in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2011: - Issue Type: Test (was: Bug) > Typo in TestLeafQueue > - > > Key: YARN-2011 > URL: https://issues.apache.org/jira/browse/YARN-2011 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.4.0 >Reporter: Chen He >Assignee: Chen He >Priority: Trivial > Attachments: YARN-2011.patch > > > a.assignContainers(clusterResource, node_0); > assertEquals(2*GB, a.getUsedResources().getMemory()); > assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); > assertEquals(0*GB, app_1.getCurrentConsumption().getMemory()); > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G > // Again one to user_0 since he hasn't exceeded user limit yet > a.assignContainers(clusterResource, node_0); > assertEquals(3*GB, a.getUsedResources().getMemory()); > assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); > assertEquals(1*GB, app_1.getCurrentConsumption().getMemory()); > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2011) Typo in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2011: - Target Version/s: 2.5.0 (was: 2.4.1) > Typo in TestLeafQueue > - > > Key: YARN-2011 > URL: https://issues.apache.org/jira/browse/YARN-2011 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Chen He >Assignee: Chen He >Priority: Trivial > Attachments: YARN-2011.patch > > > a.assignContainers(clusterResource, node_0); > assertEquals(2*GB, a.getUsedResources().getMemory()); > assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); > assertEquals(0*GB, app_1.getCurrentConsumption().getMemory()); > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G > // Again one to user_0 since he hasn't exceeded user limit yet > a.assignContainers(clusterResource, node_0); > assertEquals(3*GB, a.getUsedResources().getMemory()); > assertEquals(2*GB, app_0.getCurrentConsumption().getMemory()); > assertEquals(1*GB, app_1.getCurrentConsumption().getMemory()); > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G > assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1861: - Attachment: YARN-1861.3.patch Updated a patch not to introduce a warning by just adding SuppressWarnings annotation. [~xgong], sorry if you mind my cutting in. But this JIRA is blocker of 2.4.1 release and we should fix it as soon as possible. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store
[ https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1701: - Attachment: YARN-1701.3.patch > Improve default paths of timeline store and generic history store > - > > Key: YARN-1701 > URL: https://issues.apache.org/jira/browse/YARN-1701 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Blocker > Attachments: YARN-1701.3.patch, YARN-1701.v01.patch, > YARN-1701.v02.patch > > > When I enable AHS via yarn.ahs.enabled, the app history is still not visible > in AHS webUI. This is due to NullApplicationHistoryStore as > yarn.resourcemanager.history-writer.class. It would be good to have just one > key to enable basic functionality. > yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is > local file system location. However, FileSystemApplicationHistoryStore uses > DFS by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1701) Improve default paths of timeline store and generic history store
[ https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987887#comment-13987887 ] Tsuyoshi OZAWA commented on YARN-1701: -- Updated a patch based on Zhijie's idea. It looks reasonable to me. [~jira.shegalov] sorry for cutting in. I updated a patch because this issue is blocker of 2.4.1 release. Please feel free to take it back to you. > Improve default paths of timeline store and generic history store > - > > Key: YARN-1701 > URL: https://issues.apache.org/jira/browse/YARN-1701 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Blocker > Attachments: YARN-1701.3.patch, YARN-1701.v01.patch, > YARN-1701.v02.patch > > > When I enable AHS via yarn.ahs.enabled, the app history is still not visible > in AHS webUI. This is due to NullApplicationHistoryStore as > yarn.resourcemanager.history-writer.class. It would be good to have just one > key to enable basic functionality. > yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is > local file system location. However, FileSystemApplicationHistoryStore uses > DFS by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987906#comment-13987906 ] Hadoop QA commented on YARN-1861: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643061/YARN-1861.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3679//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3679//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3679//console This message is automatically generated. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2016) Yarn getApplicationRequest start time range is not honored
[ https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkat Ranganathan updated YARN-2016: - Attachment: YarnTest.java > Yarn getApplicationRequest start time range is not honored > -- > > Key: YARN-2016 > URL: https://issues.apache.org/jira/browse/YARN-2016 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Venkat Ranganathan > Attachments: YarnTest.java > > > When we query for the previous applications by creating an instance of > GetApplicationsRequest and setting the start time range and application tag, > we see that the start range provided is not honored and all applications with > the tag are returned > Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2016) Yarn getApplicationRequest start time range is not honored
Venkat Ranganathan created YARN-2016: Summary: Yarn getApplicationRequest start time range is not honored Key: YARN-2016 URL: https://issues.apache.org/jira/browse/YARN-2016 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Venkat Ranganathan Attachments: YarnTest.java When we query for the previous applications by creating an instance of GetApplicationsRequest and setting the start time range and application tag, we see that the start range provided is not honored and all applications with the tag are returned Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1701) Improve default paths of timeline store and generic history store
[ https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987920#comment-13987920 ] Hadoop QA commented on YARN-1701: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643069/YARN-1701.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3680//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3680//console This message is automatically generated. > Improve default paths of timeline store and generic history store > - > > Key: YARN-1701 > URL: https://issues.apache.org/jira/browse/YARN-1701 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Blocker > Attachments: YARN-1701.3.patch, YARN-1701.v01.patch, > YARN-1701.v02.patch > > > When I enable AHS via yarn.ahs.enabled, the app history is still not visible > in AHS webUI. This is due to NullApplicationHistoryStore as > yarn.resourcemanager.history-writer.class. It would be good to have just one > key to enable basic functionality. > yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is > local file system location. However, FileSystemApplicationHistoryStore uses > DFS by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2016) Yarn getApplicationRequest start time range is not honored
[ https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987922#comment-13987922 ] Venkat Ranganathan commented on YARN-2016: -- I have briefly discussed this with [~vinodkv] and [~djp] has an idea on the fix > Yarn getApplicationRequest start time range is not honored > -- > > Key: YARN-2016 > URL: https://issues.apache.org/jira/browse/YARN-2016 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Venkat Ranganathan > Attachments: YarnTest.java > > > When we query for the previous applications by creating an instance of > GetApplicationsRequest and setting the start time range and application tag, > we see that the start range provided is not honored and all applications with > the tag are returned > Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1861: Attachment: YARN-1861.4.patch > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987972#comment-13987972 ] Xuan Gong commented on YARN-1861: - [~ozawa] Thanks. Uploaded a new patch based on the latest trunk and fix -1 on findbug > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987971#comment-13987971 ] Siqi Li commented on YARN-1945: --- [~sjlee0] do you know what does that -1 javac mean? > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988000#comment-13988000 ] Sangjin Lee commented on YARN-1945: --- You got more java compiler warnings than the trunk baseline. You might want to look at the warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3672//artifact/trunk/patchprocess/diffJavacWarnings.txt > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1857: -- Attachment: YARN-1857.patch > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He > Attachments: YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988071#comment-13988071 ] Hadoop QA commented on YARN-1861: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643078/YARN-1861.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1279 javac compiler warnings (more than the trunk's current 1278 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3681//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3681//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3681//console This message is automatically generated. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988096#comment-13988096 ] Hadoop QA commented on YARN-1857: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643084/YARN-1857.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3682//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3682//console This message is automatically generated. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He > Attachments: YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988106#comment-13988106 ] Tsuyoshi OZAWA commented on YARN-1872: -- As a workaround for 2.4.1 release, +1 for [~zhiguohong]'s patch(non-binding). [~zjshen], [~ste...@apache.org], how about solving the issue essentially on YARN-1902 against 2.5.0 release as Hong mentioned? What do you think? > TestDistributedShell occasionally fails in trunk > > > Key: YARN-1872 > URL: https://issues.apache.org/jira/browse/YARN-1872 > Project: Hadoop YARN > Issue Type: Test >Reporter: Ted Yu >Assignee: Hong Zhiguo >Priority: Blocker > Attachments: TestDistributedShell.out, YARN-1872.patch > > > From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console : > TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and > TestDistributedShell#testDSShell timed out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988124#comment-13988124 ] Chen He commented on YARN-1857: --- The TestRMRestart successfully passed on my laptop. I think this failure may not be related to my patch. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He > Attachments: YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1989) Adding shell scripts to launch multiple servers on localhost
[ https://issues.apache.org/jira/browse/YARN-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988155#comment-13988155 ] Gera Shegalov commented on YARN-1989: - Hi [~iwasakims], I have recently documented this idea for my team: http://gerashegalov.github.io/running-multiple-hadoop-nodes-on-the-same-OS/ > Adding shell scripts to launch multiple servers on localhost > > > Key: YARN-1989 > URL: https://issues.apache.org/jira/browse/YARN-1989 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Masatake Iwasaki >Priority: Minor > > Adding shell scripts to launch multiple servers on localhost for test and > debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: (was: YARN-1945.v3.patch) > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: YARN-1945.v3.patch > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: YARN-1945.v4.patch > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, > YARN-1945.v4.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988217#comment-13988217 ] Hadoop QA commented on YARN-1945: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643108/YARN-1945.v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3683//console This message is automatically generated. > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, > YARN-1945.v4.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-1864: - Attachment: YARN-1864-v5.txt Pre-commit build isn't kicking off for some reason,resubmitting patch. > Fair Scheduler Dynamic Hierarchical User Queues > --- > > Key: YARN-1864 > URL: https://issues.apache.org/jira/browse/YARN-1864 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Ashwin Shankar > Labels: scheduler > Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, > YARN-1864-v4.txt, YARN-1864-v5.txt > > > In Fair Scheduler, we want to be able to create user queues under any parent > queue in the hierarchy. For eg. Say user1 submits a job to a parent queue > called root.allUserQueues, we want be able to create a new queue called > root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted > by this user to root.allUserQueues will be run in this newly created > root.allUserQueues.user1. > This is very similar to the 'user-as-default' feature in Fair Scheduler which > creates user queues under root queue. But we want the ability to create user > queues under ANY parent queue. > Why do we want this ? > 1. Preemption : these dynamically created user queues can preempt each other > if its fair share is not met. So there is fairness among users. > User queues can also preempt other non-user leaf queue as well if below fair > share. > 2. Allocation to user queues : we want all the user queries(adhoc) to consume > only a fraction of resources in the shared cluster. By creating this > feature,we could do that by giving a fair share to the parent user queue > which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1861: Attachment: YARN-1861.5.patch fix -1 on Javadoc warning > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: YARN-1945.v5.patch > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, > YARN-1945.v4.patch, YARN-1945.v5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (YARN-1868) YARN status web ui does not show correctly in IE 11
[ https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu reopened YARN-1868: - Reopen this one because the IE configuration is on by default; thus impact the user experience on Windows. > YARN status web ui does not show correctly in IE 11 > --- > > Key: YARN-1868 > URL: https://issues.apache.org/jira/browse/YARN-1868 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.0.0 >Reporter: Chuan Liu > Attachments: YARN_status.png > > > The YARN status web ui does not show correctly in IE 11. The drop down menu > for app entries are not shown. Also the navigation menu displays incorrectly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application
[ https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988299#comment-13988299 ] Anubhav Dhoot commented on YARN-941: One option is we make the token expiration time configurable in the request api for ResourceManager and other token secret managers. This will allow each long running application to request a longer max lifetime for its tokens during startup. There can be caps per user/group that limits max lifetime that can be requested. > RM Should have a way to update the tokens it has for a running application > -- > > Key: YARN-941 > URL: https://issues.apache.org/jira/browse/YARN-941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans > > When an application is submitted to the RM it includes with it a set of > tokens that the RM will renew on behalf of the application, that will be > passed to the AM when the application is launched, and will be used when > launching the application to access HDFS to download files on behalf of the > application. > For long lived applications/services these tokens can expire, and then the > tokens that the AM has will be invalid, and the tokens that the RM had will > also not work to launch a new AM. > We need to provide an API that will allow the RM to replace the current > tokens for this application with a new set. To avoid any real race issues, I > think this API should be something that the AM calls, so that the client can > connect to the AM with a new set of tokens it got using kerberos, then the AM > can inform the RM of the new set of tokens and quickly update its tokens > internally to use these new ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2
[ https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988356#comment-13988356 ] Wangda Tan commented on YARN-1906: -- I just encountered this issue when I working on YARN-1201 too, The assertion failure code snippet, {code} while (loadedApp1.getAppAttempts().size() != 2) { Thread.sleep(200); } attempt1 = loadedApp1.getCurrentAppAttempt(); attemptId1 = attempt1.getAppAttemptId(); rm2.waitForState(attemptId1, RMAppAttemptState.SCHEDULED); assertQueueMetrics(qm2, 1, 1, 0, 0); {code} And in assertQueueMetrics(), following assertion is failed {code} Assert.assertEquals(qm.getAppsSubmitted(), appsSubmitted + appsSubmittedCarryOn); {code} +1 to [~zjshen]'s suggestion, we should add message to assertion sentence > TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and > branch2 > --- > > Key: YARN-1906 > URL: https://issues.apache.org/jira/browse/YARN-1906 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.4.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-1906.patch, YARN-1906.patch > > > Here is the output of the format > {noformat} > testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 9.757 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at org.junit.Assert.assertEquals(Assert.java:456) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1201: - Attachment: YARN-1201.patch Resubmit patch because last Jenkins build failed by TestRMRestart which tracked by YARN-1906 > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988380#comment-13988380 ] Hadoop QA commented on YARN-1864: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643115/YARN-1864-v5.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3686//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3686//console This message is automatically generated. > Fair Scheduler Dynamic Hierarchical User Queues > --- > > Key: YARN-1864 > URL: https://issues.apache.org/jira/browse/YARN-1864 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Ashwin Shankar > Labels: scheduler > Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, > YARN-1864-v4.txt, YARN-1864-v5.txt > > > In Fair Scheduler, we want to be able to create user queues under any parent > queue in the hierarchy. For eg. Say user1 submits a job to a parent queue > called root.allUserQueues, we want be able to create a new queue called > root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted > by this user to root.allUserQueues will be run in this newly created > root.allUserQueues.user1. > This is very similar to the 'user-as-default' feature in Fair Scheduler which > creates user queues under root queue. But we want the ability to create user > queues under ANY parent queue. > Why do we want this ? > 1. Preemption : these dynamically created user queues can preempt each other > if its fair share is not met. So there is fairness among users. > User queues can also preempt other non-user leaf queue as well if below fair > share. > 2. Allocation to user queues : we want all the user queries(adhoc) to consume > only a fraction of resources in the shared cluster. By creating this > feature,we could do that by giving a fair share to the parent user queue > which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988382#comment-13988382 ] Hadoop QA commented on YARN-1945: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643125/YARN-1945.v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3684//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3684//console This message is automatically generated. > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, > YARN-1945.v4.patch, YARN-1945.v5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988388#comment-13988388 ] Hadoop QA commented on YARN-1861: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643124/YARN-1861.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3685//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3685//console This message is automatically generated. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2017) Merge common code in schedulers
Jian He created YARN-2017: - Summary: Merge common code in schedulers Key: YARN-2017 URL: https://issues.apache.org/jira/browse/YARN-2017 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He A bunch of same code is repeated among schedulers, e.g: between FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a common base. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1868) YARN status web ui does not show correctly in IE 11
[ https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu reassigned YARN-1868: --- Assignee: Chuan Liu > YARN status web ui does not show correctly in IE 11 > --- > > Key: YARN-1868 > URL: https://issues.apache.org/jira/browse/YARN-1868 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.0.0 >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN_status.png > > > The YARN status web ui does not show correctly in IE 11. The drop down menu > for app entries are not shown. Also the navigation menu displays incorrectly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1868) YARN status web ui does not show correctly in IE 11
[ https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-1868: Attachment: YARN-1868.patch Attach a patch demonstrating the fix. I took the fix from the following link: http://social.msdn.microsoft.com/Forums/ie/en-US/acf1e236-715b-4feb-8132-f88e8b6652c5/how-to-overrde-compatibility-mode-for-intranet-site-when-display-intranet-sites-in-compatibility If the fix is acceptable, I will add unit tests as well. > YARN status web ui does not show correctly in IE 11 > --- > > Key: YARN-1868 > URL: https://issues.apache.org/jira/browse/YARN-1868 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.0.0 >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-1868.patch, YARN_status.png > > > The YARN status web ui does not show correctly in IE 11. The drop down menu > for app entries are not shown. Also the navigation menu displays incorrectly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally
[ https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988446#comment-13988446 ] Tsuyoshi OZAWA commented on YARN-2018: -- Stack trace: {quote} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService$4.run(TestClientRMService.java:481) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService$4.run(TestClientRMService.java:474) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1606) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testTokenRenewalWrongUser(TestClientRMService.java:474) {quote} > TestClientRMService.testTokenRenewalWrongUser fails occasionally > -- > > Key: YARN-2018 > URL: https://issues.apache.org/jira/browse/YARN-2018 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA > > The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally
Tsuyoshi OZAWA created YARN-2018: Summary: TestClientRMService.testTokenRenewalWrongUser fails occasionally Key: YARN-2018 URL: https://issues.apache.org/jira/browse/YARN-2018 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2016) Yarn getApplicationRequest start time range is not honored
[ https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-2016: Assignee: Junping Du > Yarn getApplicationRequest start time range is not honored > -- > > Key: YARN-2016 > URL: https://issues.apache.org/jira/browse/YARN-2016 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Venkat Ranganathan >Assignee: Junping Du > Attachments: YarnTest.java > > > When we query for the previous applications by creating an instance of > GetApplicationsRequest and setting the start time range and application tag, > we see that the start range provided is not honored and all applications with > the tag are returned > Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988450#comment-13988450 ] Tsuyoshi OZAWA commented on YARN-1861: -- Thanks for updating patch, Xuan. TestClientRMService failure looks not related to the change, so I filed it on YARN-2018. I'll try to look at the latest patch. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2016) Yarn getApplicationRequest start time range is not honored
[ https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988449#comment-13988449 ] Junping Du commented on YARN-2016: -- The start time range (and other properties) is not merged from local to proto in PBImpl of getApplicationRequest. Will deliver a patch to fix it soon. > Yarn getApplicationRequest start time range is not honored > -- > > Key: YARN-2016 > URL: https://issues.apache.org/jira/browse/YARN-2016 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Venkat Ranganathan >Assignee: Junping Du > Attachments: YarnTest.java > > > When we query for the previous applications by creating an instance of > GetApplicationsRequest and setting the start time range and application tag, > we see that the start range provided is not honored and all applications with > the tag are returned > Attaching a reproducer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988458#comment-13988458 ] Ashwin Shankar commented on YARN-1864: -- Both test failures are not related to my patch. > Fair Scheduler Dynamic Hierarchical User Queues > --- > > Key: YARN-1864 > URL: https://issues.apache.org/jira/browse/YARN-1864 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Ashwin Shankar > Labels: scheduler > Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, > YARN-1864-v4.txt, YARN-1864-v5.txt > > > In Fair Scheduler, we want to be able to create user queues under any parent > queue in the hierarchy. For eg. Say user1 submits a job to a parent queue > called root.allUserQueues, we want be able to create a new queue called > root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted > by this user to root.allUserQueues will be run in this newly created > root.allUserQueues.user1. > This is very similar to the 'user-as-default' feature in Fair Scheduler which > creates user queues under root queue. But we want the ability to create user > queues under ANY parent queue. > Why do we want this ? > 1. Preemption : these dynamically created user queues can preempt each other > if its fair share is not met. So there is fairness among users. > User queues can also preempt other non-user leaf queue as well if below fair > share. > 2. Allocation to user queues : we want all the user queries(adhoc) to consume > only a fraction of resources in the shared cluster. By creating this > feature,we could do that by giving a fair share to the parent user queue > which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
Junping Du created YARN-2019: Summary: Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore Key: YARN-2019 URL: https://issues.apache.org/jira/browse/YARN-2019 Project: Hadoop YARN Issue Type: Bug Reporter: Junping Du Priority: Critical Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal exception to crash RM down. As shown in YARN-1924, it could due to RM HA internal bug itself, but not fatal exception. We should retrospect some decision here as HA feature is designed to protect key component but not disturb it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988518#comment-13988518 ] Sandy Ryza commented on YARN-1961: -- Hey Ashwin, The history is only that we haven't yet added support for that property in parent queues. I agree that it would be a helpful thing to add. > Fair scheduler preemption doesn't work for non-leaf queues > -- > > Key: YARN-1961 > URL: https://issues.apache.org/jira/browse/YARN-1961 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Ashwin Shankar > Labels: scheduler > > Setting minResources and minSharePreemptionTimeout to a non-leaf queue > doesn't cause preemption to happen when that non-leaf queue is below > minResources and there are outstanding demands in that non-leaf queue. > Here is an example fs allocation config(partial) : > {code:xml} > > 3072 mb,0 vcores > 30 > > > > > > {code} > With the above configs,preemption doesn't seem to happen if queue abc is > below minShare and it has outstanding unsatisfied demands from apps in its > child queues. Ideally in such cases we would like preemption to kick off and > reclaim resources from other queues(not under queue abc). > Looking at the code it seems like preemption checks for starvation only at > the leaf queue level and not at the parent level. > {code:title=FairScheduler.java|borderStyle=solid} > boolean isStarvedForMinShare(FSLeafQueue sched) > boolean isStarvedForFairShare(FSLeafQueue sched) > {code} > This affects our use case where we have a parent queue with probably a 100 > unconfigured leaf queues under it.We want to give a minshare to the parent > queue to protect all the leaf queues under it,but we cannot do it due to this > bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988528#comment-13988528 ] Ravi Prakash commented on YARN-1963: I wonder if it'd be a good idea to percolate the priorities onto the actual containers as well? (I'm thinking (re)nice-ing container processes) ? That way we can submit more jobs than can all fit into memory and take advantage of OS scheduling to pick up the ones with the highest priority? > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988529#comment-13988529 ] Tsuyoshi OZAWA commented on YARN-1861: -- [~xgong] Great work. The test case by Xuan checks whether the fix by Karthik works well by injecting RMFatalEventType.STATE_STORE_FENCED directly. My review comments are as follows: {code} // Transition to standby and reinit active services LOG.info("Transitioning RM to Standby mode"); rm.transitionToStandby(true); +rm.adminService.resetLeaderElection(); return; } catch (Exception e) { {code} We should call rm.adminService.resetLeaderElection() in the finally block. If rm.transitionToStandby() fails while stoping RM's services, all RM can stuck. {code} +int maxWaittingAttempt = 20; +while (maxWaittingAttempt -- > 0) { {code} maxWaittingAttempt should be maxWaitingAttempt. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988536#comment-13988536 ] Junping Du commented on YARN-1201: -- Kick off jenkins manually as resubmit a old patch won't trigger Jenkins' test. The patch looks good to me overall. However, I think we'd better to improve code below: {code} + return expected.isInstance(e) || ( + e != null && isCause(expected, e.getCause()); {code} if e is null, then it depends the behavior of isInstance(objectB) in JDK (some old version JDK will return true for this case, please refer: https://bugs.openjdk.java.net/browse/JDK-4081023, which suggest user to handle null case before calling this method). Thus, I think a more clear way to do is: {code} + return e != null && (expected.isInstance(e) || isCause(expected, e.getCause()); {code} Also, it is better to add some comments on new added method. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988537#comment-13988537 ] Hadoop QA commented on YARN-1201: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643145/YARN-1201.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3688//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3688//console This message is automatically generated. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally
[ https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-2018: -- Attachment: YARN-2018.patch It seems HADOOP-10562 modified the message. Here is the patch. > TestClientRMService.testTokenRenewalWrongUser fails occasionally > -- > > Key: YARN-2018 > URL: https://issues.apache.org/jira/browse/YARN-2018 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA > Attachments: YARN-2018.patch > > > The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1201: - Attachment: YARN-1201.patch Nice catch [~djp]! Thanks your comment, I've uploaded new patch addressed your comments. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1803) Signal container support in nodemanager
[ https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-1803: -- Attachment: YARN-1803.patch Here is the patch to support signal container functionality in node manager. 1. NodeStatusUpdater will send ContainerManagerEventType.SIGNAL_CONTAINERS to ContainerManager when it receives notification from RM. That will be covered under https://issues.apache.org/jira/browse/YARN-1805. 2. After ContainerManager receives ContainerManagerEventType.SIGNAL_CONTAINERS, it will notify ContainersLauncher via ContainersLauncherEventType.SIGNAL_CONTAINER and eventually deliver the request to ContainerExecutor. 3. ContainerExecutor's signalContainer method is modified to take OS-independent SignalContainerCommand. Note, the patch also includes YARN-1897 so that jenkins can build the patch. > Signal container support in nodemanager > --- > > Key: YARN-1803 > URL: https://issues.apache.org/jira/browse/YARN-1803 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-1803.patch > > > It could include the followings. > 1. ContainerManager is able to process a new event type > ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and > deliver the request to ContainerExecutor. > 2. Translate the platform independent signal command to Linux specific > signals. Windows support will be tracked by another task. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988554#comment-13988554 ] Junping Du commented on YARN-1201: -- Thanks [~leftnoteasy] for addressing my comments above. A few typos need to be fixed here: {code} + * this because sometimes, a exception will be wrapped to another exception {code} should be "an exception" {code} + * So we cannot simply cache AccessControlException by using {code} should be "catch". Will +1 when typo is fixed and Jenkins result get +1. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1201: - Attachment: YARN-1201.patch Thanks [~djp], fixed typos according to your suggestion > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1803) Signal container support in nodemanager
[ https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988557#comment-13988557 ] Hadoop QA commented on YARN-1803: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643173/YARN-1803.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3690//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3690//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3690//console This message is automatically generated. > Signal container support in nodemanager > --- > > Key: YARN-1803 > URL: https://issues.apache.org/jira/browse/YARN-1803 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-1803.patch > > > It could include the followings. > 1. ContainerManager is able to process a new event type > ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and > deliver the request to ContainerExecutor. > 2. Translate the platform independent signal command to Linux specific > signals. Windows support will be tracked by another task. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988559#comment-13988559 ] Hadoop QA commented on YARN-1201: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643171/YARN-1201.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3691//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3691//console This message is automatically generated. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988567#comment-13988567 ] Hadoop QA commented on YARN-1201: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643175/YARN-1201.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3692//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3692//console This message is automatically generated. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved
[ https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988570#comment-13988570 ] Junping Du commented on YARN-1201: -- +1. Patch looks good to me. The test failure is not related and tracked in YARN-2018. Will commit it shortly. > TestAMAuthorization fails with local hostname cannot be resolved > > > Key: YARN-1201 > URL: https://issues.apache.org/jira/browse/YARN-1201 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.1.0-beta > Environment: SUSE Linux Enterprise Server 11 (x86_64) >Reporter: Nemon Lou >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, > YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch > > > When hostname is 158-1-131-10, TestAMAuthorization fails. > {code} > Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.952 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.116 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) > Results : > Tests in error: > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > TestAMAuthorization.testUnauthorizedAccess:284 NullPointer > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988573#comment-13988573 ] Tsuyoshi OZAWA commented on YARN-1861: -- > We should call rm.adminService.resetLeaderElection() in the finally block. If > rm.transitionToStandby() fails while stoping RM's services, all RM can stuck. Sorry, I noticed this is wrong. If rm.transitionToStandby() fails, RM can stuck until ZK server detects the failure. We can call EmbeddedElectorService.stop() in exception hander to shutdown gracefully, but this is one option. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988590#comment-13988590 ] Karthik Kambatla commented on YARN-1861: Please wait for me to take a look at this until Sunday evening. > Both RM stuck in standby mode when automatic failover is enabled > > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally
[ https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2018: - Assignee: Ming Ma > TestClientRMService.testTokenRenewalWrongUser fails occasionally > -- > > Key: YARN-2018 > URL: https://issues.apache.org/jira/browse/YARN-2018 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA >Assignee: Ming Ma > Attachments: YARN-2018.patch > > > The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally
[ https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988599#comment-13988599 ] Tsuyoshi OZAWA commented on YARN-2018: -- +1(non-binding). Let's wait for Jenkins. > TestClientRMService.testTokenRenewalWrongUser fails occasionally > -- > > Key: YARN-2018 > URL: https://issues.apache.org/jira/browse/YARN-2018 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA >Assignee: Ming Ma > Attachments: YARN-2018.patch > > > The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562
[ https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2018: - Summary: TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562(was: TestClientRMService.testTokenRenewalWrongUser fails occasionally ) > TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562 > > > Key: YARN-2018 > URL: https://issues.apache.org/jira/browse/YARN-2018 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA >Assignee: Ming Ma > Attachments: YARN-2018.patch > > > The test failure is observed on YARN-1945 and YARN-1861. -- This message was sent by Atlassian JIRA (v6.2#6252)