[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1201:
-

Attachment: YARN-1201.patch

Handle wrapped exception case

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987525#comment-13987525
 ] 

Hadoop QA commented on YARN-1201:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643011/YARN-1201.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3678//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3678//console

This message is automatically generated.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-05-02 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987553#comment-13987553
 ] 

Rohith commented on YARN-1963:
--

Added to Sunil thoughts, priority of jobs can also be displayed at RM web UI.!!

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987590#comment-13987590
 ] 

Rohith commented on YARN-2010:
--

For completed applications before starting in secured mode, clientTokenMaterKey 
is null. After starting in secured mode, recovery of apps fails since 
clientTokenMasterKey is null. During recovering application, rm should have 
intellegence to decide whether recovering applicaiton has run in secured mode 
or non secured mode. This is possible by checking cilentTokenMasterKey for 
null. 

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>  
> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2010:


Assignee: Rohith

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Rohith
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>  
> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987597#comment-13987597
 ] 

Hudson commented on YARN-1696:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #557 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/557/])
YARN-1696. Added documentation for ResourceManager fail-over. Contributed by 
Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1591416)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png


> Document RM HA
> --
>
> Key: YARN-1696
> URL: https://issues.apache.org/jira/browse/YARN-1696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Fix For: 2.4.1
>
> Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, 
> YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, 
> yarn-1696-1.patch
>
>
> Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
> required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2010:
-

Attachment: YARN-2010.patch

Uploading patch without test written.

Thinking of how to write test, should complete flow need to consider or only 
RMAppAttempt.recoveryApplication() can be called.?!!

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Rohith
> Attachments: YARN-2010.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369)
>  
> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987695#comment-13987695
 ] 

Hudson commented on YARN-1696:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1774 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1774/])
YARN-1696. Added documentation for ResourceManager fail-over. Contributed by 
Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1591416)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png


> Document RM HA
> --
>
> Key: YARN-1696
> URL: https://issues.apache.org/jira/browse/YARN-1696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Fix For: 2.4.1
>
> Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, 
> YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, 
> yarn-1696-1.patch
>
>
> Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
> required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1696) Document RM HA

2014-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987690#comment-13987690
 ] 

Hudson commented on YARN-1696:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1748 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1748/])
YARN-1696. Added documentation for ResourceManager fail-over. Contributed by 
Karthik Kambatla, Masatake Iwasaki, Tsuyoshi OZAWA. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1591416)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/rm-ha-overview.png


> Document RM HA
> --
>
> Key: YARN-1696
> URL: https://issues.apache.org/jira/browse/YARN-1696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Fix For: 2.4.1
>
> Attachments: YARN-1676.5.patch, YARN-1696-3.patch, YARN-1696.2.patch, 
> YARN-1696.4.patch, YARN-1696.6.patch, rm-ha-overview.png, rm-ha-overview.svg, 
> yarn-1696-1.patch
>
>
> Add documentation for RM HA. Marking this a blocker for 2.4 as this is 
> required to call RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987760#comment-13987760
 ] 

Tsuyoshi OZAWA commented on YARN-2000:
--

Hi [~jianhe], do you mind finishing YARN-1474? These JIRAs can be conflicted.

> Fix ordering of starting services inside the RM
> ---
>
> Key: YARN-2000
> URL: https://issues.apache.org/jira/browse/YARN-2000
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> The order of starting services in RM would be:
> - Recovery of the app/attempts
> - Start the scheduler and add scheduler app/attempts
> - Start ResourceTrackerService and re-populate the containers in scheduler 
> based on the containers info from NMs 
> - ApplicationMasterService either don’t start or start but block until all 
> the previous NMs registers.
> Other than these, there are other services like ClientRMService, Webapps 
> which we need to  think about the order too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2000) Fix ordering of starting services inside the RM

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987764#comment-13987764
 ] 

Tsuyoshi OZAWA commented on YARN-2000:
--

typoed: s/do you mind finishing/do you mind if you wait for finishing/

> Fix ordering of starting services inside the RM
> ---
>
> Key: YARN-2000
> URL: https://issues.apache.org/jira/browse/YARN-2000
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> The order of starting services in RM would be:
> - Recovery of the app/attempts
> - Start the scheduler and add scheduler app/attempts
> - Start ResourceTrackerService and re-populate the containers in scheduler 
> based on the containers info from NMs 
> - ApplicationMasterService either don’t start or start but block until all 
> the previous NMs registers.
> Other than these, there are other services like ClientRMService, Webapps 
> which we need to  think about the order too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1874:
-

Issue Type: Improvement  (was: Bug)

> Cleanup: Move RMActiveServices out of ResourceManager into its own file
> ---
>
> Key: YARN-1874
> URL: https://issues.apache.org/jira/browse/YARN-1874
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1874.1.patch
>
>
> As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
> should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987767#comment-13987767
 ] 

Tsuyoshi OZAWA commented on YARN-1874:
--

I'll fix the patch to pass tests.

> Cleanup: Move RMActiveServices out of ResourceManager into its own file
> ---
>
> Key: YARN-1874
> URL: https://issues.apache.org/jira/browse/YARN-1874
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1874.1.patch
>
>
> As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
> should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2011) Typo in TestLeafQueue

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2011:
-

Issue Type: Test  (was: Bug)

> Typo in TestLeafQueue
> -
>
> Key: YARN-2011
> URL: https://issues.apache.org/jira/browse/YARN-2011
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.4.0
>Reporter: Chen He
>Assignee: Chen He
>Priority: Trivial
> Attachments: YARN-2011.patch
>
>
> a.assignContainers(clusterResource, node_0);
> assertEquals(2*GB, a.getUsedResources().getMemory());
> assertEquals(2*GB, app_0.getCurrentConsumption().getMemory());
> assertEquals(0*GB, app_1.getCurrentConsumption().getMemory());
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G
> // Again one to user_0 since he hasn't exceeded user limit yet
> a.assignContainers(clusterResource, node_0);
> assertEquals(3*GB, a.getUsedResources().getMemory());
> assertEquals(2*GB, app_0.getCurrentConsumption().getMemory());
> assertEquals(1*GB, app_1.getCurrentConsumption().getMemory());
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2011) Typo in TestLeafQueue

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2011:
-

Target Version/s: 2.5.0  (was: 2.4.1)

> Typo in TestLeafQueue
> -
>
> Key: YARN-2011
> URL: https://issues.apache.org/jira/browse/YARN-2011
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Chen He
>Assignee: Chen He
>Priority: Trivial
> Attachments: YARN-2011.patch
>
>
> a.assignContainers(clusterResource, node_0);
> assertEquals(2*GB, a.getUsedResources().getMemory());
> assertEquals(2*GB, app_0.getCurrentConsumption().getMemory());
> assertEquals(0*GB, app_1.getCurrentConsumption().getMemory());
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G
> // Again one to user_0 since he hasn't exceeded user limit yet
> a.assignContainers(clusterResource, node_0);
> assertEquals(3*GB, a.getUsedResources().getMemory());
> assertEquals(2*GB, app_0.getCurrentConsumption().getMemory());
> assertEquals(1*GB, app_1.getCurrentConsumption().getMemory());
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G
> assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1861:
-

Attachment: YARN-1861.3.patch

Updated a patch not to introduce a warning by just adding SuppressWarnings 
annotation. [~xgong], sorry if you mind my cutting in. But this JIRA is blocker 
of 2.4.1 release and we should fix it as soon as possible.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1701) Improve default paths of timeline store and generic history store

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1701:
-

Attachment: YARN-1701.3.patch

> Improve default paths of timeline store and generic history store
> -
>
> Key: YARN-1701
> URL: https://issues.apache.org/jira/browse/YARN-1701
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: YARN-1701.3.patch, YARN-1701.v01.patch, 
> YARN-1701.v02.patch
>
>
> When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
> in AHS webUI. This is due to NullApplicationHistoryStore as 
> yarn.resourcemanager.history-writer.class. It would be good to have just one 
> key to enable basic functionality.
> yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
> local file system location. However, FileSystemApplicationHistoryStore uses 
> DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1701) Improve default paths of timeline store and generic history store

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987887#comment-13987887
 ] 

Tsuyoshi OZAWA commented on YARN-1701:
--

Updated a patch based on Zhijie's idea. It looks reasonable to me.

[~jira.shegalov] sorry for cutting in. I updated a patch because this issue is 
blocker of 2.4.1 release. Please feel free to take it back to you.

> Improve default paths of timeline store and generic history store
> -
>
> Key: YARN-1701
> URL: https://issues.apache.org/jira/browse/YARN-1701
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: YARN-1701.3.patch, YARN-1701.v01.patch, 
> YARN-1701.v02.patch
>
>
> When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
> in AHS webUI. This is due to NullApplicationHistoryStore as 
> yarn.resourcemanager.history-writer.class. It would be good to have just one 
> key to enable basic functionality.
> yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
> local file system location. However, FileSystemApplicationHistoryStore uses 
> DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987906#comment-13987906
 ] 

Hadoop QA commented on YARN-1861:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643061/YARN-1861.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3679//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3679//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3679//console

This message is automatically generated.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-02 Thread Venkat Ranganathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkat Ranganathan updated YARN-2016:
-

Attachment: YarnTest.java

> Yarn getApplicationRequest start time range is not honored
> --
>
> Key: YARN-2016
> URL: https://issues.apache.org/jira/browse/YARN-2016
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Venkat Ranganathan
> Attachments: YarnTest.java
>
>
> When we query for the previous applications by creating an instance of 
> GetApplicationsRequest and setting the start time range and application tag, 
> we see that the start range provided is not honored and all applications with 
> the tag are returned
> Attaching a reproducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-02 Thread Venkat Ranganathan (JIRA)
Venkat Ranganathan created YARN-2016:


 Summary: Yarn getApplicationRequest start time range is not honored
 Key: YARN-2016
 URL: https://issues.apache.org/jira/browse/YARN-2016
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Venkat Ranganathan
 Attachments: YarnTest.java

When we query for the previous applications by creating an instance of 
GetApplicationsRequest and setting the start time range and application tag, we 
see that the start range provided is not honored and all applications with the 
tag are returned

Attaching a reproducer.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1701) Improve default paths of timeline store and generic history store

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987920#comment-13987920
 ] 

Hadoop QA commented on YARN-1701:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643069/YARN-1701.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3680//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3680//console

This message is automatically generated.

> Improve default paths of timeline store and generic history store
> -
>
> Key: YARN-1701
> URL: https://issues.apache.org/jira/browse/YARN-1701
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: YARN-1701.3.patch, YARN-1701.v01.patch, 
> YARN-1701.v02.patch
>
>
> When I enable AHS via yarn.ahs.enabled, the app history is still not visible 
> in AHS webUI. This is due to NullApplicationHistoryStore as 
> yarn.resourcemanager.history-writer.class. It would be good to have just one 
> key to enable basic functionality.
> yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is 
> local file system location. However, FileSystemApplicationHistoryStore uses 
> DFS by default.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-02 Thread Venkat Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987922#comment-13987922
 ] 

Venkat Ranganathan commented on YARN-2016:
--

I have briefly discussed this with [~vinodkv] and [~djp] has an idea on the fix

> Yarn getApplicationRequest start time range is not honored
> --
>
> Key: YARN-2016
> URL: https://issues.apache.org/jira/browse/YARN-2016
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Venkat Ranganathan
> Attachments: YarnTest.java
>
>
> When we query for the previous applications by creating an instance of 
> GetApplicationsRequest and setting the start time range and application tag, 
> we see that the start range provided is not honored and all applications with 
> the tag are returned
> Attaching a reproducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1861:


Attachment: YARN-1861.4.patch

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987972#comment-13987972
 ] 

Xuan Gong commented on YARN-1861:
-

[~ozawa] Thanks.

Uploaded a new patch based on the latest trunk and fix -1 on findbug

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987971#comment-13987971
 ] 

Siqi Li commented on YARN-1945:
---

[~sjlee0] do you know what does that -1 javac mean?

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988000#comment-13988000
 ] 

Sangjin Lee commented on YARN-1945:
---

You got more java compiler warnings than the trunk baseline. You might want to 
look at the warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3672//artifact/trunk/patchprocess/diffJavacWarnings.txt

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-02 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1857:
--

Attachment: YARN-1857.patch

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
> Attachments: YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988071#comment-13988071
 ] 

Hadoop QA commented on YARN-1861:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643078/YARN-1861.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1279 javac 
compiler warnings (more than the trunk's current 1278 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3681//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3681//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3681//console

This message is automatically generated.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988096#comment-13988096
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643084/YARN-1857.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3682//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3682//console

This message is automatically generated.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
> Attachments: YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988106#comment-13988106
 ] 

Tsuyoshi OZAWA commented on YARN-1872:
--

As a workaround for 2.4.1 release, +1 for [~zhiguohong]'s patch(non-binding).

[~zjshen], [~ste...@apache.org], how about solving the issue essentially on 
YARN-1902 against 2.5.0 release as Hong mentioned? What do you think?

> TestDistributedShell occasionally fails in trunk
> 
>
> Key: YARN-1872
> URL: https://issues.apache.org/jira/browse/YARN-1872
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Hong Zhiguo
>Priority: Blocker
> Attachments: TestDistributedShell.out, YARN-1872.patch
>
>
> From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
> TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
> TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-02 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988124#comment-13988124
 ] 

Chen He commented on YARN-1857:
---

The TestRMRestart successfully passed on my laptop. I think this failure may 
not be related to my patch.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
> Attachments: YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1989) Adding shell scripts to launch multiple servers on localhost

2014-05-02 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988155#comment-13988155
 ] 

Gera Shegalov commented on YARN-1989:
-

Hi [~iwasakims], I have recently documented this idea for my team: 
http://gerashegalov.github.io/running-multiple-hadoop-nodes-on-the-same-OS/

> Adding shell scripts to launch multiple servers on localhost
> 
>
> Key: YARN-1989
> URL: https://issues.apache.org/jira/browse/YARN-1989
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Masatake Iwasaki
>Priority: Minor
>
> Adding shell scripts to launch multiple servers on localhost for test and 
> debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1945:
--

Attachment: (was: YARN-1945.v3.patch)

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1945:
--

Attachment: YARN-1945.v3.patch

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1945:
--

Attachment: YARN-1945.v4.patch

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, 
> YARN-1945.v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988217#comment-13988217
 ] 

Hadoop QA commented on YARN-1945:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643108/YARN-1945.v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3683//console

This message is automatically generated.

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, 
> YARN-1945.v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-05-02 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-1864:
-

Attachment: YARN-1864-v5.txt

Pre-commit build isn't kicking off for some reason,resubmitting patch.

> Fair Scheduler Dynamic Hierarchical User Queues
> ---
>
> Key: YARN-1864
> URL: https://issues.apache.org/jira/browse/YARN-1864
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, 
> YARN-1864-v4.txt, YARN-1864-v5.txt
>
>
> In Fair Scheduler, we want to be able to create user queues under any parent 
> queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
> called root.allUserQueues, we want be able to create a new queue called 
> root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
> by this user to root.allUserQueues will be run in this newly created 
> root.allUserQueues.user1.
> This is very similar to the 'user-as-default' feature in Fair Scheduler which 
> creates user queues under root queue. But we want the ability to create user 
> queues under ANY parent queue.
> Why do we want this ?
> 1. Preemption : these dynamically created user queues can preempt each other 
> if its fair share is not met. So there is fairness among users.
> User queues can also preempt other non-user leaf queue as well if below fair 
> share.
> 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
> only a fraction of resources in the shared cluster. By creating this 
> feature,we could do that by giving a fair share to the parent user queue 
> which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1861:


Attachment: YARN-1861.5.patch

fix -1 on Javadoc warning

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1945:
--

Attachment: YARN-1945.v5.patch

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, 
> YARN-1945.v4.patch, YARN-1945.v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (YARN-1868) YARN status web ui does not show correctly in IE 11

2014-05-02 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu reopened YARN-1868:
-


Reopen this one because the IE configuration is on by default; thus impact the 
user experience on Windows.

> YARN status web ui does not show correctly in IE 11
> ---
>
> Key: YARN-1868
> URL: https://issues.apache.org/jira/browse/YARN-1868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.0.0
>Reporter: Chuan Liu
> Attachments: YARN_status.png
>
>
> The YARN status web ui does not show correctly in IE 11. The drop down menu 
> for app entries are not shown. Also the navigation menu displays incorrectly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-02 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988299#comment-13988299
 ] 

Anubhav Dhoot commented on YARN-941:


One option is we make the token expiration time configurable in the request api 
for ResourceManager and other token secret managers. This will allow each long 
running application to request a longer max lifetime for its tokens during 
startup. There can be caps per user/group that limits max lifetime that can be 
requested.

> RM Should have a way to update the tokens it has for a running application
> --
>
> Key: YARN-941
> URL: https://issues.apache.org/jira/browse/YARN-941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>
> When an application is submitted to the RM it includes with it a set of 
> tokens that the RM will renew on behalf of the application, that will be 
> passed to the AM when the application is launched, and will be used when 
> launching the application to access HDFS to download files on behalf of the 
> application.
> For long lived applications/services these tokens can expire, and then the 
> tokens that the AM has will be invalid, and the tokens that the RM had will 
> also not work to launch a new AM.
> We need to provide an API that will allow the RM to replace the current 
> tokens for this application with a new set.  To avoid any real race issues, I 
> think this API should be something that the AM calls, so that the client can 
> connect to the AM with a new set of tokens it got using kerberos, then the AM 
> can inform the RM of the new set of tokens and quickly update its tokens 
> internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1906) TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and branch2

2014-05-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988356#comment-13988356
 ] 

Wangda Tan commented on YARN-1906:
--

I just encountered this issue when I working on YARN-1201 too,
The assertion failure code snippet,
{code}
while (loadedApp1.getAppAttempts().size() != 2) {
  Thread.sleep(200);
}
attempt1 = loadedApp1.getCurrentAppAttempt();
attemptId1 = attempt1.getAppAttemptId();
rm2.waitForState(attemptId1, RMAppAttemptState.SCHEDULED);
assertQueueMetrics(qm2, 1, 1, 0, 0);
{code}
And in assertQueueMetrics(), following assertion is failed
{code}
Assert.assertEquals(qm.getAppsSubmitted(),
appsSubmitted + appsSubmittedCarryOn);
{code}
+1 to [~zjshen]'s suggestion, we should add message to assertion sentence

> TestRMRestart#testQueueMetricsOnRMRestart fails intermittently on trunk and 
> branch2
> ---
>
> Key: YARN-1906
> URL: https://issues.apache.org/jira/browse/YARN-1906
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.4.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-1906.patch, YARN-1906.patch
>
>
> Here is the output of the format
> {noformat}
> testQueueMetricsOnRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 9.757 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1735)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1706)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1201:
-

Attachment: YARN-1201.patch

Resubmit patch because last Jenkins build failed by TestRMRestart which tracked 
by YARN-1906

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988380#comment-13988380
 ] 

Hadoop QA commented on YARN-1864:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643115/YARN-1864-v5.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3686//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3686//console

This message is automatically generated.

> Fair Scheduler Dynamic Hierarchical User Queues
> ---
>
> Key: YARN-1864
> URL: https://issues.apache.org/jira/browse/YARN-1864
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, 
> YARN-1864-v4.txt, YARN-1864-v5.txt
>
>
> In Fair Scheduler, we want to be able to create user queues under any parent 
> queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
> called root.allUserQueues, we want be able to create a new queue called 
> root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
> by this user to root.allUserQueues will be run in this newly created 
> root.allUserQueues.user1.
> This is very similar to the 'user-as-default' feature in Fair Scheduler which 
> creates user queues under root queue. But we want the ability to create user 
> queues under ANY parent queue.
> Why do we want this ?
> 1. Preemption : these dynamically created user queues can preempt each other 
> if its fair share is not met. So there is fairness among users.
> User queues can also preempt other non-user leaf queue as well if below fair 
> share.
> 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
> only a fraction of resources in the shared cluster. By creating this 
> feature,we could do that by giving a fair share to the parent user queue 
> which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988382#comment-13988382
 ] 

Hadoop QA commented on YARN-1945:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643125/YARN-1945.v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3684//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3684//console

This message is automatically generated.

> Adding description for each pool in Fair Scheduler Page from 
> fair-scheduler.xml
> ---
>
> Key: YARN-1945
> URL: https://issues.apache.org/jira/browse/YARN-1945
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, 
> YARN-1945.v4.patch, YARN-1945.v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988388#comment-13988388
 ] 

Hadoop QA commented on YARN-1861:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643124/YARN-1861.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3685//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3685//console

This message is automatically generated.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2017) Merge common code in schedulers

2014-05-02 Thread Jian He (JIRA)
Jian He created YARN-2017:
-

 Summary: Merge common code in schedulers
 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


A bunch of same code is repeated among schedulers, e.g:  between 
FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1868) YARN status web ui does not show correctly in IE 11

2014-05-02 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu reassigned YARN-1868:
---

Assignee: Chuan Liu

> YARN status web ui does not show correctly in IE 11
> ---
>
> Key: YARN-1868
> URL: https://issues.apache.org/jira/browse/YARN-1868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.0.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN_status.png
>
>
> The YARN status web ui does not show correctly in IE 11. The drop down menu 
> for app entries are not shown. Also the navigation menu displays incorrectly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1868) YARN status web ui does not show correctly in IE 11

2014-05-02 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-1868:


Attachment: YARN-1868.patch

Attach a patch demonstrating the fix. I took the fix from the following link:
http://social.msdn.microsoft.com/Forums/ie/en-US/acf1e236-715b-4feb-8132-f88e8b6652c5/how-to-overrde-compatibility-mode-for-intranet-site-when-display-intranet-sites-in-compatibility

If the fix is acceptable, I will add unit tests as well.

> YARN status web ui does not show correctly in IE 11
> ---
>
> Key: YARN-1868
> URL: https://issues.apache.org/jira/browse/YARN-1868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.0.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Attachments: YARN-1868.patch, YARN_status.png
>
>
> The YARN status web ui does not show correctly in IE 11. The drop down menu 
> for app entries are not shown. Also the navigation menu displays incorrectly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988446#comment-13988446
 ] 

Tsuyoshi OZAWA commented on YARN-2018:
--

Stack trace:
{quote}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService$4.run(TestClientRMService.java:481)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService$4.run(TestClientRMService.java:474)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1606)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testTokenRenewalWrongUser(TestClientRMService.java:474)
{quote}

> TestClientRMService.testTokenRenewalWrongUser fails occasionally  
> --
>
> Key: YARN-2018
> URL: https://issues.apache.org/jira/browse/YARN-2018
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
>
> The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-2018:


 Summary: TestClientRMService.testTokenRenewalWrongUser fails 
occasionally  
 Key: YARN-2018
 URL: https://issues.apache.org/jira/browse/YARN-2018
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA


The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-2016:


Assignee: Junping Du

> Yarn getApplicationRequest start time range is not honored
> --
>
> Key: YARN-2016
> URL: https://issues.apache.org/jira/browse/YARN-2016
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Venkat Ranganathan
>Assignee: Junping Du
> Attachments: YarnTest.java
>
>
> When we query for the previous applications by creating an instance of 
> GetApplicationsRequest and setting the start time range and application tag, 
> we see that the start range provided is not honored and all applications with 
> the tag are returned
> Attaching a reproducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988450#comment-13988450
 ] 

Tsuyoshi OZAWA commented on YARN-1861:
--

Thanks for updating patch, Xuan. TestClientRMService failure looks not related 
to the change, so I filed it on YARN-2018. I'll try to look at the latest patch.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988449#comment-13988449
 ] 

Junping Du commented on YARN-2016:
--

The start time range (and other properties) is not merged from local to proto 
in PBImpl of getApplicationRequest. Will deliver a patch to fix it soon.

> Yarn getApplicationRequest start time range is not honored
> --
>
> Key: YARN-2016
> URL: https://issues.apache.org/jira/browse/YARN-2016
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Venkat Ranganathan
>Assignee: Junping Du
> Attachments: YarnTest.java
>
>
> When we query for the previous applications by creating an instance of 
> GetApplicationsRequest and setting the start time range and application tag, 
> we see that the start range provided is not honored and all applications with 
> the tag are returned
> Attaching a reproducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-05-02 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988458#comment-13988458
 ] 

Ashwin Shankar commented on YARN-1864:
--

Both test failures are not related to my patch.

> Fair Scheduler Dynamic Hierarchical User Queues
> ---
>
> Key: YARN-1864
> URL: https://issues.apache.org/jira/browse/YARN-1864
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, 
> YARN-1864-v4.txt, YARN-1864-v5.txt
>
>
> In Fair Scheduler, we want to be able to create user queues under any parent 
> queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
> called root.allUserQueues, we want be able to create a new queue called 
> root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
> by this user to root.allUserQueues will be run in this newly created 
> root.allUserQueues.user1.
> This is very similar to the 'user-as-default' feature in Fair Scheduler which 
> creates user queues under root queue. But we want the ability to create user 
> queues under ANY parent queue.
> Why do we want this ?
> 1. Preemption : these dynamically created user queues can preempt each other 
> if its fair share is not met. So there is fairness among users.
> User queues can also preempt other non-user leaf queue as well if below fair 
> share.
> 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
> only a fraction of resources in the shared cluster. By creating this 
> feature,we could do that by giving a fair share to the parent user queue 
> which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-05-02 Thread Junping Du (JIRA)
Junping Du created YARN-2019:


 Summary: Retrospect on decision of making RM crashed if any 
exception throw in ZKRMStateStore
 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical


Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
internal bug itself, but not fatal exception. We should retrospect some 
decision here as HA feature is designed to protect key component but not 
disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues

2014-05-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988518#comment-13988518
 ] 

Sandy Ryza commented on YARN-1961:
--

Hey Ashwin,
The history is only that we haven't yet added support for that property in 
parent queues.  I agree that it would be a helpful thing to add.

> Fair scheduler preemption doesn't work for non-leaf queues
> --
>
> Key: YARN-1961
> URL: https://issues.apache.org/jira/browse/YARN-1961
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0
>Reporter: Ashwin Shankar
>  Labels: scheduler
>
> Setting minResources and minSharePreemptionTimeout to a non-leaf queue 
> doesn't cause preemption to happen when that non-leaf queue is below 
> minResources and there are outstanding demands in that non-leaf queue.
> Here is an example fs allocation config(partial) :
> {code:xml}
> 
>   3072 mb,0 vcores
>   30
> 
> 
> 
> 
>  
>  {code}
> With the above configs,preemption doesn't seem to happen if queue abc is 
> below minShare and it has outstanding unsatisfied demands from apps in its 
> child queues. Ideally in such cases we would like preemption to kick off and 
> reclaim resources from other queues(not under queue abc).
> Looking at the code it seems like preemption checks for starvation only at 
> the leaf queue level and not at the parent level.
> {code:title=FairScheduler.java|borderStyle=solid}
> boolean isStarvedForMinShare(FSLeafQueue sched)
> boolean isStarvedForFairShare(FSLeafQueue sched)
> {code}
> This affects our use case where we have a parent queue with probably a 100 
> unconfigured leaf queues under it.We want to give a minshare to the parent 
> queue to protect all the leaf queues under it,but we cannot do it due to this 
> bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-05-02 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988528#comment-13988528
 ] 

Ravi Prakash commented on YARN-1963:


I wonder if it'd be a good idea to percolate the priorities onto the actual 
containers as well? (I'm thinking (re)nice-ing container processes) ? That way 
we can submit more jobs than can all fit into memory and take advantage of OS 
scheduling to pick up the ones with the highest priority?

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988529#comment-13988529
 ] 

Tsuyoshi OZAWA commented on YARN-1861:
--

[~xgong] Great work. The test case by Xuan checks whether the fix by Karthik 
works well by injecting RMFatalEventType.STATE_STORE_FENCED directly.

My review comments are as follows:
{code}
 // Transition to standby and reinit active services
 LOG.info("Transitioning RM to Standby mode");
 rm.transitionToStandby(true);
+rm.adminService.resetLeaderElection();
 return;
   } catch (Exception e) {
{code}

We should call rm.adminService.resetLeaderElection() in the finally block. If 
rm.transitionToStandby() fails while stoping RM's services, all RM can stuck.

{code}
+int maxWaittingAttempt = 20;
+while (maxWaittingAttempt -- > 0) {
{code}

maxWaittingAttempt should be maxWaitingAttempt.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988536#comment-13988536
 ] 

Junping Du commented on YARN-1201:
--

Kick off jenkins manually as resubmit a old patch won't trigger Jenkins' test.
The patch looks good to me overall. However, I think we'd better to improve 
code below:
{code}
+ return expected.isInstance(e) || (
+ e != null && isCause(expected, e.getCause());
{code}
if e is null, then it depends the behavior of isInstance(objectB) in JDK (some 
old version JDK will return true for this case, please refer: 
https://bugs.openjdk.java.net/browse/JDK-4081023, which suggest user to handle 
null case before calling this method). Thus, I think a more clear way to do is:
{code}
+ return e != null && (expected.isInstance(e) || isCause(expected, 
e.getCause());
{code}
Also, it is better to add some comments on new added method.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988537#comment-13988537
 ] 

Hadoop QA commented on YARN-1201:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643145/YARN-1201.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3688//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3688//console

This message is automatically generated.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally

2014-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-2018:
--

Attachment: YARN-2018.patch

It seems HADOOP-10562 modified the message. Here is the patch.

> TestClientRMService.testTokenRenewalWrongUser fails occasionally  
> --
>
> Key: YARN-2018
> URL: https://issues.apache.org/jira/browse/YARN-2018
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
> Attachments: YARN-2018.patch
>
>
> The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1201:
-

Attachment: YARN-1201.patch

Nice catch [~djp]! Thanks your comment,
I've uploaded new patch addressed your comments.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1803) Signal container support in nodemanager

2014-05-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1803:
--

Attachment: YARN-1803.patch

Here is the patch to support signal container functionality in node manager.

1. NodeStatusUpdater will send  ContainerManagerEventType.SIGNAL_CONTAINERS to 
ContainerManager when it receives notification from RM. That will be covered 
under https://issues.apache.org/jira/browse/YARN-1805.
2. After ContainerManager receives ContainerManagerEventType.SIGNAL_CONTAINERS, 
it will notify ContainersLauncher via 
ContainersLauncherEventType.SIGNAL_CONTAINER and eventually deliver the request 
to ContainerExecutor.
3. ContainerExecutor's signalContainer method is modified to take 
OS-independent SignalContainerCommand.

Note, the patch also includes YARN-1897 so that jenkins can build the patch.

> Signal container support in nodemanager
> ---
>
> Key: YARN-1803
> URL: https://issues.apache.org/jira/browse/YARN-1803
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1803.patch
>
>
> It could include the followings.
> 1. ContainerManager is able to process a new event type 
> ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and 
> deliver the request to ContainerExecutor.
> 2. Translate the platform independent signal command to Linux specific 
> signals. Windows support will be tracked by another task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988554#comment-13988554
 ] 

Junping Du commented on YARN-1201:
--

Thanks [~leftnoteasy] for addressing my comments above. A few typos need to be 
fixed here:
{code}
+   * this because sometimes, a exception will be wrapped to another exception
{code}
should be "an exception"
{code}
+   * So we cannot simply cache AccessControlException by using
{code}
should be "catch".
Will +1 when typo is fixed and Jenkins result get +1. 

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1201:
-

Attachment: YARN-1201.patch

Thanks [~djp], fixed typos according to your suggestion

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1803) Signal container support in nodemanager

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988557#comment-13988557
 ] 

Hadoop QA commented on YARN-1803:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643173/YARN-1803.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3690//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3690//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3690//console

This message is automatically generated.

> Signal container support in nodemanager
> ---
>
> Key: YARN-1803
> URL: https://issues.apache.org/jira/browse/YARN-1803
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1803.patch
>
>
> It could include the followings.
> 1. ContainerManager is able to process a new event type 
> ContainerManagerEventType.SIGNAL_CONTAINERS coming from NodeStatusUpdater and 
> deliver the request to ContainerExecutor.
> 2. Translate the platform independent signal command to Linux specific 
> signals. Windows support will be tracked by another task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988559#comment-13988559
 ] 

Hadoop QA commented on YARN-1201:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643171/YARN-1201.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3691//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3691//console

This message is automatically generated.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988567#comment-13988567
 ] 

Hadoop QA commented on YARN-1201:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643175/YARN-1201.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3692//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3692//console

This message is automatically generated.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1201) TestAMAuthorization fails with local hostname cannot be resolved

2014-05-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988570#comment-13988570
 ] 

Junping Du commented on YARN-1201:
--

+1. Patch looks good to me. The test failure is not related and tracked in 
YARN-2018.
Will commit it shortly.

> TestAMAuthorization fails with local hostname cannot be resolved
> 
>
> Key: YARN-1201
> URL: https://issues.apache.org/jira/browse/YARN-1201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
>Reporter: Nemon Lou
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, 
> YARN-1201.patch, YARN-1201.patch, YARN-1201.patch, YARN-1201.patch
>
>
> When hostname is 158-1-131-10, TestAMAuthorization fails.
> {code}
> Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.952 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.116 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
> Results :
> Tests in error:
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
>   TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988573#comment-13988573
 ] 

Tsuyoshi OZAWA commented on YARN-1861:
--

> We should call rm.adminService.resetLeaderElection() in the finally block. If 
> rm.transitionToStandby() fails while stoping RM's services, all RM can stuck.

Sorry, I noticed this is wrong. If rm.transitionToStandby() fails, RM can stuck 
until ZK server detects the failure. We can call EmbeddedElectorService.stop() 
in exception hander to shutdown gracefully, but this is one option.

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988590#comment-13988590
 ] 

Karthik Kambatla commented on YARN-1861:


Please wait for me to take a look at this until Sunday evening. 

> Both RM stuck in standby mode when automatic failover is enabled
> 
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2018:
-

Assignee: Ming Ma

> TestClientRMService.testTokenRenewalWrongUser fails occasionally  
> --
>
> Key: YARN-2018
> URL: https://issues.apache.org/jira/browse/YARN-2018
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
>Assignee: Ming Ma
> Attachments: YARN-2018.patch
>
>
> The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails occasionally

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988599#comment-13988599
 ] 

Tsuyoshi OZAWA commented on YARN-2018:
--

+1(non-binding). Let's wait for Jenkins.

> TestClientRMService.testTokenRenewalWrongUser fails occasionally  
> --
>
> Key: YARN-2018
> URL: https://issues.apache.org/jira/browse/YARN-2018
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
>Assignee: Ming Ma
> Attachments: YARN-2018.patch
>
>
> The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562

2014-05-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2018:
-

Summary: TestClientRMService.testTokenRenewalWrongUser fails after 
HADOOP-10562(was: TestClientRMService.testTokenRenewalWrongUser fails 
occasionally  )

> TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562  
> 
>
> Key: YARN-2018
> URL: https://issues.apache.org/jira/browse/YARN-2018
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
>Assignee: Ming Ma
> Attachments: YARN-2018.patch
>
>
> The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)