[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093781#comment-14093781
 ] 

Hadoop QA commented on YARN-2408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661139/YARN-2408.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4600//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4600//console

This message is automatically generated.

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
Priority: Minor
  Labels: features
 Attachments: YARN-2408.patch


 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API:
 {code:xml}
 resourceRequests
   MB96256/MB
   VCores94/VCores
   appMaster
 applicationIdapplication_/applicationId
 applicationAttemptIdappattempt_/applicationAttemptId
 queueNamedefault/queueName
 totalPendingMB96256/totalPendingMB
 totalPendingVCores94/totalPendingVCores
 numResourceRequests3/numResourceRequests
 resourceRequests
   request
 MB1024/MB
 VCores1/VCores
 resourceName/default-rack/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
   request
 MB1024/MB
 VCores1/VCores
 resourceName*/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
   request
 MB1024/MB
 VCores1/VCores
 resourceNamemaster/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
 /resourceRequests
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-08-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2408:
--

Priority: Major  (was: Minor)

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features
 Attachments: YARN-2408.patch


 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API:
 {code:xml}
 resourceRequests
   MB96256/MB
   VCores94/VCores
   appMaster
 applicationIdapplication_/applicationId
 applicationAttemptIdappattempt_/applicationAttemptId
 queueNamedefault/queueName
 totalPendingMB96256/totalPendingMB
 totalPendingVCores94/totalPendingVCores
 numResourceRequests3/numResourceRequests
 resourceRequests
   request
 MB1024/MB
 VCores1/VCores
 resourceName/default-rack/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
   request
 MB1024/MB
 VCores1/VCores
 resourceName*/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
   request
 MB1024/MB
 VCores1/VCores
 resourceNamemaster/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
 /resourceRequests
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager

2014-08-12 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-2407:
---

Assignee: Yu Gao

 Users are not allowed to view their own jobs, denied by JobACLsManager
 --

 Key: YARN-2407
 URL: https://issues.apache.org/jira/browse/YARN-2407
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.1
Reporter: Yu Gao
Assignee: Yu Gao
 Attachments: YARN-2407.patch


 Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as 
 a non-admin user user1. The job could be finished successfully, but the 
 running progress was not displayed correctly on the command-line, and I got 
 following in the corresponding ApplicationMaster log:
 INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
 handler 0 on 56717, call 
 org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
 9.30.95.26:61024 Call#59 Retry#0
 org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
 operation VIEW_JOB on job_1407456690588_0003
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
   at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at 
 java.security.AccessController.doPrivileged(AccessController.java:366)
   at javax.security.auth.Subject.doAs(Subject.java:572)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-12 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2409:
---

 Summary: InvalidStateTransitonException in ResourceManager after 
job recovery
 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty


{code}
at java.lang.Thread.run(Thread.java:662)
2014-08-12 07:03:00,839 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
STATUS_UPDATE at LAUNCHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-08-12 07:03:00,839 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_ALLOCATED at LAUNCHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094003#comment-14094003
 ] 

Hudson commented on YARN-1337:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6050 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6050/])
YARN-1337. Recover containers upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617448)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncherEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java
* 

[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094018#comment-14094018
 ] 

Hudson commented on YARN-1337:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #643 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/643/])
YARN-1337. Recover containers upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617448)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncherEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java
* 

[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094015#comment-14094015
 ] 

Hudson commented on YARN-2400:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #643 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/643/])
YARN-2400: Addendum fix for TestAMRestart failure. Contributed by Jian He 
(xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617333)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 TestAMRestart fails intermittently
 --

 Key: YARN-2400
 URL: https://issues.apache.org/jira/browse/YARN-2400
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2240.2.patch, YARN-2400.1.patch


 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094019#comment-14094019
 ] 

Hudson commented on YARN-2138:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #643 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/643/])
YARN-2138. Cleaned up notifyDone* APIs in RMStateStore. Contributed by Varun 
Saxena (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617341)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.6.0

 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, 
 YARN-2138.004.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-659) RMStateStore's removeApplication APIs should just take an applicationId

2014-08-12 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-659:


Attachment: YARN-659.5.patch

Rebased on trunk.

 RMStateStore's removeApplication APIs should just take an applicationId
 ---

 Key: YARN-659
 URL: https://issues.apache.org/jira/browse/YARN-659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-659.1.patch, YARN-659.2.patch, YARN-659.3.patch, 
 YARN-659.4.patch, YARN-659.5.patch


 There is no need to give in the whole state for removal - just an ID should 
 be enough when an app finishes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094073#comment-14094073
 ] 

Eric Payne commented on YARN-2409:
--

[~nishan],

Is the application timeline service (ATS) running when you see these 
exceptions? I have seen this happening when ATS is running, but not when it is 
turned off. It appears there may be an incompatibility between ATS and the 
State Store subsystem in the RM.

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty

 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1352) Recover LogAggregationService upon nodemanager restart

2014-08-12 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-1352.
--

Resolution: Duplicate

Resolving as log aggregation should be covered by the combination of recovering 
applications in YARN-1354 and containers in YARN-1337.

 Recover LogAggregationService upon nodemanager restart
 --

 Key: YARN-1352
 URL: https://issues.apache.org/jira/browse/YARN-1352
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe

 LogAggregationService state needs to be recovered as part of the 
 work-preserving nodemanager restart feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread chang li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094178#comment-14094178
 ] 

chang li commented on YARN-2308:


[~wangda] 
I don't think this conf set should be necessary either, but if I don't include 
this my unit test will not fail on NPE somehow.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094199#comment-14094199
 ] 

Hudson commented on YARN-1337:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1861 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1861/])
YARN-1337. Recover containers upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617448)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncherEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java
* 

[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094196#comment-14094196
 ] 

Hudson commented on YARN-2400:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1861 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1861/])
YARN-2400: Addendum fix for TestAMRestart failure. Contributed by Jian He 
(xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617333)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 TestAMRestart fails intermittently
 --

 Key: YARN-2400
 URL: https://issues.apache.org/jira/browse/YARN-2400
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2240.2.patch, YARN-2400.1.patch


 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094200#comment-14094200
 ] 

Hudson commented on YARN-2138:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1861 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1861/])
YARN-2138. Cleaned up notifyDone* APIs in RMStateStore. Contributed by Varun 
Saxena (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617341)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.6.0

 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, 
 YARN-2138.004.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2410) Nodemanager ShuffleHandler can easily exhaust file descriptors

2014-08-12 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-2410:


 Summary: Nodemanager ShuffleHandler can easily exhaust file 
descriptors
 Key: YARN-2410
 URL: https://issues.apache.org/jira/browse/YARN-2410
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Nathan Roberts
Priority: Critical


The async nature of the shufflehandler can cause it to open a huge number of
file descriptors, when it runs out it crashes.

Scenario:
Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
Let's say all 6K reduces hit a node at about same time asking for their
outputs. Each reducer will ask for all 40 map outputs over a single socket in a
single request (not necessarily all 40 at once, but with coalescing it is
likely to be a large number).

sendMapOutput() will open the file for random reading and then perform an async 
transfer of the particular portion of this file(). This will theoretically
happen 6000*40=24 times which will run the NM out of file descriptors and 
cause it to crash.

The algorithm should be refactored a little to not open the fds until they're
actually needed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094212#comment-14094212
 ] 

Hudson commented on YARN-2400:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1835 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1835/])
YARN-2400: Addendum fix for TestAMRestart failure. Contributed by Jian He 
(xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617333)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 TestAMRestart fails intermittently
 --

 Key: YARN-2400
 URL: https://issues.apache.org/jira/browse/YARN-2400
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2240.2.patch, YARN-2400.1.patch


 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094216#comment-14094216
 ] 

Hudson commented on YARN-2138:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1835 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1835/])
YARN-2138. Cleaned up notifyDone* APIs in RMStateStore. Contributed by Varun 
Saxena (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617341)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.6.0

 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, 
 YARN-2138.004.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-659) RMStateStore's removeApplication APIs should just take an applicationId

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094228#comment-14094228
 ] 

Hadoop QA commented on YARN-659:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661214/YARN-659.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4601//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4601//console

This message is automatically generated.

 RMStateStore's removeApplication APIs should just take an applicationId
 ---

 Key: YARN-659
 URL: https://issues.apache.org/jira/browse/YARN-659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-659.1.patch, YARN-659.2.patch, YARN-659.3.patch, 
 YARN-659.4.patch, YARN-659.5.patch


 There is no need to give in the whole state for removal - just an ID should 
 be enough when an app finishes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-12 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2393:
--

Attachment: YARN-2393-2.patch

Update a patch which address Ashwin's comment: recompute share when new queue 
created dynamically.

Discuss with Karthik offline: perfer changing static fair share to steady 
fair share.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch, YARN-2393-2.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094327#comment-14094327
 ] 

Hadoop QA commented on YARN-2393:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661234/YARN-2393-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4602//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4602//console

This message is automatically generated.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch, YARN-2393-2.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094371#comment-14094371
 ] 

Jian He commented on YARN-2373:
---

Larry, thanks for updating the patch !
LGTM, +1 

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2373:
--

Assignee: Larry McCay

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2399:
---

Attachment: yarn-2399-3.patch

Updated the patch to address Sandy's comments.

As part of these changes, I have renamed runnableAppScheds in FSLeafQueue to 
runnableApps. Same for non-runnable apps. 

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--

Attachment: YARN-1198.2.patch

Patch for most items in this jira not covered in subtasks, including
update headroom when container finishes
headroom changes propagated to all applications for the same user+queue combo
admin refresh of queue results in headroom update

Not included:
If a new user submits an application to the queue then all applications 
submitted by all users in that queue should be notified of the headroom change.
because I believe that really should be done on [YARN-1857]


 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094402#comment-14094402
 ] 

Allen Wittenauer commented on YARN-2411:


It took me a minute to understand what is meant by 'mappings'... at least, I'm 
assuming mappings == default queue?

 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs to 
 queues based on the submitting user and user groups. It will be useful in a 
 number of real-world scenarios and can be migrated over to the unified scheme 
 when YARN-2257 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings.enable 
 A boolean that controls if queue mappings are enabled, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format:
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-12 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094405#comment-14094405
 ] 

Larry McCay commented on YARN-2373:
---

Thank you for the very good reviews, [~jianhe]!
Can we get it committed to both trunk and branch-2?

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-08-12 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2032:


Attachment: YARN-2032-branch2-2.patch

Attching update patch for branch 2

Thanks,
Mayank

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-12 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-2411:


Description: 
YARN-2257 has a proposal to extend and share the queue placement rules for the 
fair scheduler and the capacity scheduler. This is a good long term solution to 
streamline queue placement of both schedulers but it has core infra work that 
has to happen first and might require changes to current features in all 
schedulers along with corresponding configuration changes, if any. 

I would like to propose a change with a smaller scope in the capacity scheduler 
that addresses the core use cases for implicitly mapping jobs that have the 
default queue or no queue specified to specific queues based on the submitting 
user and user groups. It will be useful in a number of real-world scenarios and 
can be migrated over to the unified scheme when YARN-2257 becomes available.

The proposal is to add two new configuration options:

yarn.scheduler.capacity.queue-mappings.enable 
A boolean that controls if queue mappings are enabled, default is false.

and,
yarn.scheduler.capacity.queue-mappings
A string that specifies a list of mappings in the following format:

map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
map_specifier := user (u) | group (g)
source_attribute := user | group | %user
queue_name := the name of the mapped queue | %user | %primary_group

The mappings will be evaluated left to right, and the first valid mapping will 
be used. If the mapped queue does not exist, or the current user does not have 
permissions to submit jobs to the mapped queue, the submission will fail.

Example usages:
1. user1 is mapped to queue1, group1 is mapped to queue2
u:user1:queue1,g:group1:queue2

2. To map users to queues with the same name as the user:
u:%user:%user

I am happy to volunteer to take this up.

  was:
YARN-2257 has a proposal to extend and share the queue placement rules for the 
fair scheduler and the capacity scheduler. This is a good long term solution to 
streamline queue placement of both schedulers but it has core infra work that 
has to happen first and might require changes to current features in all 
schedulers along with corresponding configuration changes, if any. 

I would like to propose a change with a smaller scope in the capacity scheduler 
that addresses the core use cases for implicitly mapping jobs to queues based 
on the submitting user and user groups. It will be useful in a number of 
real-world scenarios and can be migrated over to the unified scheme when 
YARN-2257 becomes available.

The proposal is to add two new configuration options:

yarn.scheduler.capacity.queue-mappings.enable 
A boolean that controls if queue mappings are enabled, default is false.

and,
yarn.scheduler.capacity.queue-mappings
A string that specifies a list of mappings in the following format:

map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
map_specifier := user (u) | group (g)
source_attribute := user | group | %user
queue_name := the name of the mapped queue | %user | %primary_group

The mappings will be evaluated left to right, and the first valid mapping will 
be used. If the mapped queue does not exist, or the current user does not have 
permissions to submit jobs to the mapped queue, the submission will fail.

Example usages:
1. user1 is mapped to queue1, group1 is mapped to queue2
u:user1:queue1,g:group1:queue2

2. To map users to queues with the same name as the user:
u:%user:%user

I am happy to volunteer to take this up.


 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings.enable 
 A boolean that controls if queue 

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094467#comment-14094467
 ] 

Karthik Kambatla commented on YARN-415:
---

Recently, I have been thinking about related improvements and would like to 
take a look at this patch. Can I get a couple of days to review? Thanks. 

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications

2014-08-12 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2317:


Attachment: YARN-2317-081214.patch

New patch addressed [~zjshen]'s further comments. Reformatted the FAQ section 
to make the font there consistent with the other part of the article. 

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-12 Thread Ram Venkatesh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094446#comment-14094446
 ] 

Ram Venkatesh commented on YARN-2411:
-

Correct - this feature is to map from the default queue to specific queues 
based on the submitting user. I edited the description above to clarify.

 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings.enable 
 A boolean that controls if queue mappings are enabled, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format:
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094495#comment-14094495
 ] 

Hadoop QA commented on YARN-2399:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661252/yarn-2399-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4603//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4603//console

This message is automatically generated.

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094512#comment-14094512
 ] 

Hudson commented on YARN-2373:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6052 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6052/])
YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing 
SSL passwords. Contributed by Larry McCay (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java


 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.6.0

 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094568#comment-14094568
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661256/YARN-1198.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4604//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4604//console

This message is automatically generated.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094598#comment-14094598
 ] 

Sandy Ryza commented on YARN-2399:
--

+1

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--

Attachment: YARN-1198.3.patch

Update broken test to reflect intentional change in behavior

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094669#comment-14094669
 ] 

Jian He commented on YARN-415:
--

Karthik, sure, thanks for reviewing the patch.

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094674#comment-14094674
 ] 

Hadoop QA commented on YARN-2317:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661272/YARN-2317-081214.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4605//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4605//console

This message is automatically generated.

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094694#comment-14094694
 ] 

Hadoop QA commented on YARN-2032:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661260/YARN-2032-branch2-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  org.apache.hadoop.yarn.server.timeline.TestHBaseTimelineStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4606//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4606//console

This message is automatically generated.

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications

2014-08-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094712#comment-14094712
 ] 

Zhijie Shen commented on YARN-2317:
---

+1 for the latest patch. I'll commit it.

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094750#comment-14094750
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661296/YARN-1198.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4607//console

This message is automatically generated.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-08-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094763#comment-14094763
 ] 

Karthik Kambatla commented on YARN-925:
---

Folks, I just realized a part of this went into 2.5.0 and the remaining is yet 
to be committed. Can we please open a follow-up JIRA to finish this, so we 
don't have the same JIRA against multiple releases. If people working on this 
are okay with that, please mark this as resolved with fixversion 2.5.0. Thanks. 

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Shinichi Yamashita
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, 
 YARN-925-8.patch, YARN-925-9.patch


 We need to allow filter parameters for getApplications, pushing filtering to 
 the implementations of the interface. The implementations should know the 
 best about optimizing filtering. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-12 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277-v7.patch

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-12 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094772#comment-14094772
 ] 

Jonathan Eagles commented on YARN-2277:
---

[~zjshen], addressed you issues. As for chain.doFilter comment, it is the 
expected order for Filters, including the the order jetty CORS filter I believe.



 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094786#comment-14094786
 ] 

Hudson commented on YARN-2317:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6053/])
YARN-2317. Updated the document about how to write YARN applications. 
Contributed by Li Lu. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm


 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094784#comment-14094784
 ] 

Hudson commented on YARN-2399:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6053/])
YARN-2399. FairScheduler: Merge AppSchedulable and FSSchedulerApp into 
FSAppAttempt. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617600)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FifoAppComparator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/NewAppWeightBooster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/WeightAdjuster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java


 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094788#comment-14094788
 ] 

Arpit Agarwal commented on YARN-2399:
-

Hi, looks like this was committed and it broke trunk compilation. Can someone 
familiar with the patch please take a look?

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094815#comment-14094815
 ] 

Karthik Kambatla commented on YARN-2399:


Sorry about that. Looking into it. I did a bunch of tests before committing it, 
not sure what it could be. 

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-08-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094825#comment-14094825
 ] 

Zhijie Shen commented on YARN-925:
--

bq. Folks, I just realized a part of this went into 2.5.0 and the remaining is 
yet to be committed.

AFAIK, YARN-925 is part of 2.4. I reopened the ticket as we may want to improve 
the reader interface. Apparently, it was not able to be done within 2.4 window.

bq. If people working on this are okay with that, please mark this as resolved 
with fixversion 2.5.0.

Let's mark it be fixed for 2.4. And move the following discussion to a followup 
Jira.

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Shinichi Yamashita
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, 
 YARN-925-8.patch, YARN-925-9.patch


 We need to allow filter parameters for getApplications, pushing filtering to 
 the implementations of the interface. The implementations should know the 
 best about optimizing filtering. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2412) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-08-12 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2412:
-

 Summary: Augment HistoryStorage Reader Interface to Support 
Filters When Getting Applications
 Key: YARN-2412
 URL: https://issues.apache.org/jira/browse/YARN-2412
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Shinichi Yamashita






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2412) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-08-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094840#comment-14094840
 ] 

Zhijie Shen commented on YARN-2412:
---

Per discussion on YARN-925, let's continue the discussion and the work on this 
Jira. Assign the ticket to [~sinchii], as he was working on it before.

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-2412
 URL: https://issues.apache.org/jira/browse/YARN-2412
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Shinichi Yamashita





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2412) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-08-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2412:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-2412
 URL: https://issues.apache.org/jira/browse/YARN-2412
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Shinichi Yamashita





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-925) HistoryStorage Reader Interface for Application History Server

2014-08-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-925:
-

Assignee: Mayank Bansal  (was: Shinichi Yamashita)
 Summary: HistoryStorage Reader Interface for Application History Server  
(was: Augment HistoryStorage Reader Interface to Support Filters When Getting 
Applications)

 HistoryStorage Reader Interface for Application History Server
 --

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321, 2.4.0

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, 
 YARN-925-8.patch, YARN-925-9.patch


 We need to allow filter parameters for getApplications, pushing filtering to 
 the implementations of the interface. The implementations should know the 
 best about optimizing filtering. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2412) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-08-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2412:
--

Description: 
https://issues.apache.org/jira/browse/YARN-925?focusedCommentId=13800402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13800402

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-2412
 URL: https://issues.apache.org/jira/browse/YARN-2412
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Shinichi Yamashita

 https://issues.apache.org/jira/browse/YARN-925?focusedCommentId=13800402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13800402



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094845#comment-14094845
 ] 

Karthik Kambatla commented on YARN-2399:


Thanks for catching this, Arpit. Just fixed it. I messed up at commit time, and 
left a few files that should have been deleted. Turns out this happened when I 
did svn up to resolve CHANGES.txt conflict. 

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094859#comment-14094859
 ] 

Hudson commented on YARN-2399:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6054 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6054/])
YARN-2399. Delete old versions of files. FairScheduler: Merge AppSchedulable 
and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617619)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java


 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.6.0

 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-12 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094864#comment-14094864
 ] 

Arpit Agarwal commented on YARN-2399:
-

Thanks Karthik! Verified the build break is fixed.

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.6.0

 Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-2413:
--

 Summary: capacity scheduler will overallocate vcores
 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0, 3.0.0
Reporter: Allen Wittenauer
Priority: Critical


It doesn't appear that the capacity scheduler is properly allocation vcores 
when making scheduling decisions, which results in overallocation of CPU 
resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2413:
---

Description: It doesn't appear that the capacity scheduler is properly 
allocating vcores when making scheduling decisions, which may result in 
overallocation of CPU resources.  (was: It doesn't appear that the capacity 
scheduler is properly allocation vcores when making scheduling decisions, which 
results in overallocation of CPU resources.)

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094873#comment-14094873
 ] 

Allen Wittenauer commented on YARN-2413:


I might have missed something, but there appears to be a very bad bug here.  
Given the following settings:

{code}
  mapreduce.map.cpu.vcores: 10
  yarn.scheduler.maximum-allocation-vcores: 220
  mapreduce.reduce.cpu.vcores: 10
  yarn.app.mapreduce.am.resource.cpu-vcores: 10
  yarn.scheduler.minimum-allocation-vcores: 10
  yarn.nodemanager.resource.cpu-vcores: 221
{code}

The resource manager is only allocating 1 vcore per container:

{code}
2014-08-12 15:49:31,809 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned 
container container_1407883573269_0001_01_01 of capacity memory:2048, 
vCores:1 on host 10.248.3.50:8041, which has 1 containers, memory:2048, 
vCores:1 used and memory:2050, vCores:220 available after allocation
{code}

... which is clearly wrong, as the number of vcores requested should have been 
10 and the available remaining should have been 211.

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocation vcores 
 when making scheduling decisions, which results in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-659) RMStateStore's removeApplication APIs should just take an applicationId

2014-08-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094882#comment-14094882
 ] 

Tsuyoshi OZAWA commented on YARN-659:
-

The test failure is not related to this patch, and it's filed as YARN-2365.

 RMStateStore's removeApplication APIs should just take an applicationId
 ---

 Key: YARN-659
 URL: https://issues.apache.org/jira/browse/YARN-659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-659.1.patch, YARN-659.2.patch, YARN-659.3.patch, 
 YARN-659.4.patch, YARN-659.5.patch


 There is no need to give in the whole state for removal - just an ID should 
 be enough when an app finishes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094885#comment-14094885
 ] 

Sandy Ryza commented on YARN-2413:
--

The capacity scheduler truncates all vcore requests to 0 if the 
DominantResourceCalculator is not used. I think in this case it also doesn't 
make an effort to respect node vcore capacities at all.

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094891#comment-14094891
 ] 

Hadoop QA commented on YARN-2277:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661315/YARN-2277-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4608//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4608//console

This message is automatically generated.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094897#comment-14094897
 ] 

Allen Wittenauer commented on YARN-2413:


What we're seeing with the default settings (as opposed to the fabricated 
numbers above... they just help make the problem evident) is that hundreds of 
containers can get allocated on the same node because the cap scheduler isn't 
taking into consideration the core count at all.  This obviously leads to a 
massive performance breakdowns, especially if a failure scenario happens where 
multiple NMs die.

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094900#comment-14094900
 ] 

Zhijie Shen commented on YARN-2414:
---

The following code makes in AppBlock makes the assumption that the attempt is 
not null:
{code}
setTitle(join(Application , aid));

RMAppMetrics appMerics = rmApp.getRMAppMetrics();
RMAppAttemptMetrics attemptMetrics =
rmApp.getCurrentAppAttempt().getRMAppAttemptMetrics();
{code}

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 

[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094899#comment-14094899
 ] 

Sandy Ryza commented on YARN-2413:
--

I believe this is the expected behavior (i.e. Capacity Scheduler by default 
doesn't use vcores in scheduling).

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2414:
--

Component/s: webapp

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:116)
   at 
 

[jira] [Created] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2414:
-

 Summary: RM web UI: app page will crash if app is failed before 
any attempt has been created
 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


{code}
2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
handling URI: /cluster/app/application_1407887030038_0001
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:116)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 

[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094903#comment-14094903
 ] 

Sandy Ryza commented on YARN-2413:
--

I don't have an opinion on whether we should keep this as the default behavior, 
just wanted to clear up that it's what's expected.

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--

Attachment: YARN-1198.4.patch

redo-patch against current trunk, trigger rebuild b/c something went wrong with 
the last one

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094922#comment-14094922
 ] 

Allen Wittenauer edited comment on YARN-2413 at 8/12/14 11:59 PM:
--

From an operations perspective, this is not expected behavior at all.  Worse,  
it appears the only place this is truly documented is in 
capacity-scheduler.xml .


was (Author: aw):
From an operations perspective, this is not expected behavior at all.  Worse,  
it appears the only place this is documented is truly documented is in 
capacity-scheduler.xml .

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094922#comment-14094922
 ] 

Allen Wittenauer commented on YARN-2413:


From an operations perspective, this is not expected behavior at all.  Worse,  
it appears the only place this is documented is truly documented is in 
capacity-scheduler.xml .

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094924#comment-14094924
 ] 

Zhijie Shen commented on YARN-2308:
---

Investigated into the problem: when submitting the app to a non-existing queue, 
the app is going to be rejected by CS. It works fine in a normal submission, 
because addAppAttempt happens after RMApp enters ACCEPTED, when addApp has 
already been executed successfully. However, in the recover mode, addAppAttempt 
is triggered independent of the result of addApp. At this moment, app doesn't 
exist in CS as it has been rejected, while addAppAttempt assumes it should 
exist, and result in NPE.

The fix makes sense to more. Some additional comments:

bq. + conf.setBoolean(YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED, 
true);

It should be true to imitate the failure case in the description, right? 
According AttemptRecoveredTransition, if isWorkPreservingRecoveryEnabled = 
true, AppAttemptAddedSchedulerEvent will not scheduled. However, whether 
AppAttemptAddedSchedulerEvent is scheduled or not, the app should get rejected 
finally, shouldn't it? What was the test failure when 
isWorkPreservingRecoveryEnabled = false?

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-12 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2413:
---

Component/s: documentation

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1585) Provide a way to format the RMStateStore

2014-08-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-1585.


  Resolution: Duplicate
Target Version/s:   (was: )

 Provide a way to format the RMStateStore
 

 Key: YARN-1585
 URL: https://issues.apache.org/jira/browse/YARN-1585
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla

 Admins should be able to format the RMStateStore. 
 At the very least, we need this for ZKRMStateStore. Formatting the store 
 requires changing the ACLs followed by the removal of znode structure; the 
 ACL changes for fencing should be transparent to the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state

2014-08-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094950#comment-14094950
 ] 

Karthik Kambatla commented on YARN-1370:


TestAMRestart passes for me locally. 

+1. Checking this in. 

 Fair scheduler to re-populate container allocation state
 

 Key: YARN-1370
 URL: https://issues.apache.org/jira/browse/YARN-1370
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1370.001.patch


 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-08-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094957#comment-14094957
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

Thanks for your review, Jian. OK, it sounds reasonable to me.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
 YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-659) RMStateStore's removeApplication APIs should just take an applicationId

2014-08-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094958#comment-14094958
 ] 

Tsuyoshi OZAWA commented on YARN-659:
-

[~jianhe] [~kkambatl] could you take a look, please? One design concern is that 
this change increases load against ZKRMStateStore when {{removeApplication}} is 
called.

 RMStateStore's removeApplication APIs should just take an applicationId
 ---

 Key: YARN-659
 URL: https://issues.apache.org/jira/browse/YARN-659
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-659.1.patch, YARN-659.2.patch, YARN-659.3.patch, 
 YARN-659.4.patch, YARN-659.5.patch


 There is no need to give in the whole state for removal - just an ID should 
 be enough when an app finishes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094997#comment-14094997
 ] 

Craig Welch commented on YARN-1198:
---

So, looking at this a bit more holistically - it appears to me that the 
cumulative effect of the changes in this jira and it's subtasks is that any 
change in utilization by any application in the queue potentially effects the 
headroom of all of the applications in the queue (really, any change anywhere 
in the cluster when you consider [YARN-2008], but putting that aside for the 
moment...) - the current approach (.4 patch) may do the trick, but I wonder if 
it wouldn't be better to tweak things a bit in the following way:

given that:
an application's headroom is effectively a user's headroom for the 
application's queue (the user in queue headroom)
and
the user in queue headroom is effectively a generic per user headroom in the 
queue (an identical slicing for all users based on how many are active combined 
with the user limit factor) minus what that user is already using across all 
applications (already tracked in User)
and
any change which impacts this does cause a headroom recalculation for an 
application in the queue, but may affect them all

when recalculating headroom on any event we could generate one generic 
queue-user value and then iterate all the applications in the queue and adjust 
their headroom to a per user value which would simply be the generic 
queue-per-user headroom minus that user's used resources

Which is to say, I think that any time we recalculate the headroom we want to 
recalculate it for all users in the queue and apply the change to all 
applications in the queue - and I believe the simplest and most efficient way 
to do that would be to generate a generic queue headroom, apply the generic 
per user logic, then iterate the applications and set the application user's 
headroom (same for all of that user's applications - calculated once per user - 
the generic value minus that user's used resources)

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094998#comment-14094998
 ] 

Jian He commented on YARN-2378:
---

[~subru], thanks for the patch ! some comments:
- We may ignore Move at NEW_SAVING state, because Client app submission is 
generally not considered succeeded until it is saved. see 
YarnClient#submitApplication
{code}
.addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING,
RMAppEventType.MOVE, new RMAppMoveTransition())
{code}
- use AbstractYarnScheduler#getApplicationAttempt
{code}
FiCaSchedulerApp app =
(FiCaSchedulerApp) applications.get(appId).getCurrentAppAttempt();
{code}
- getCheckLeafQueue: how about renaming to getAndCheckLeafQueue

- ParentQueue#addApplication: seems moving one leafQueue to another within the 
same parent queue will cause numApplications of the parentQueue to increase. 
(can you add test for this if I'm right..)

- containers are not re-reserved in CapacityScheduler, but re-reserved in 
SchedulerApplicationAttempt. should we re-reserve the containers ?
{code}
for (MapNodeId, RMContainer map : reservedContainers.values()) {
  for (RMContainer reservedContainer : map.values()) {
Resource resource = reservedContainer.getReservedResource();
oldMetrics.unreserveResource(user, resource);
newMetrics.reserveResource(user, resource);
  }
}
{code}
- concerned about accessing parent queue while holding childQueue's lock will 
cause deadlock. probably use synchronized block to protect the metrics to be 
updated.
{code}
  synchronized public void detachContainer(Resource clusterResource,
  FiCaSchedulerApp application, RMContainer rmContainer) {
if (application != null) {
  releaseResource(clusterResource, application, rmContainer.getContainer()
  .getResource());
  LOG.info(movedContainer +  container= + rmContainer.getContainer()
  +  resource= + rmContainer.getContainer().getResource()
  +  queueMoveOut= + this +  usedCapacity= + getUsedCapacity()
  +  absoluteUsedCapacity= + getAbsoluteUsedCapacity() +  used=
  + usedResources +  cluster= + clusterResource);
  // Inform the parent queue
  getParent().detachContainer(clusterResource, application, rmContainer);
}
  }
{code}

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095004#comment-14095004
 ] 

Jason Lowe commented on YARN-1198:
--

I'd like to avoid iterating all the applications in the queue, as we do too 
much of that already.  Wouldn't it be more efficient to have the applications 
reference a common headroom object if they truly share the same headroom?  Off 
the top of my head I'm thinking of some headroom object that applications could 
reference that in turn contained the immutable Resource reference representing 
the headroom.  If we can lookup these headroom objects per-user-per-queue and 
assign the same headroom object to each application in a queue for the same 
user then we only have to iterate the number of users in the queue rather than 
the number of applications in the queue.  One gotcha is we'd have to fixup the 
headroom object for an application that moved between queues (which is possible 
in the FairScheduler today and soon the CapacityScheduler).


 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2415) Mark MiniYARNCluster as Public Stable

2014-08-12 Thread Hari Shreedharan (JIRA)
Hari Shreedharan created YARN-2415:
--

 Summary: Mark MiniYARNCluster as Public Stable
 Key: YARN-2415
 URL: https://issues.apache.org/jira/browse/YARN-2415
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hari Shreedharan


The MR/HDFS equivalents are available for applications to use in tests, but the 
YARN Mini cluster is not. It would be really useful to test applications that 
are written to run on YARN (like Spark)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2415) Mark MiniYARNCluster as Public Stable

2014-08-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-2415:
--

Assignee: Karthik Kambatla

 Mark MiniYARNCluster as Public Stable
 -

 Key: YARN-2415
 URL: https://issues.apache.org/jira/browse/YARN-2415
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hari Shreedharan
Assignee: Karthik Kambatla

 The MR/HDFS equivalents are available for applications to use in tests, but 
 the YARN Mini cluster is not. It would be really useful to test applications 
 that are written to run on YARN (like Spark)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2415) Expose MiniYARNCluster for use outside of YARN

2014-08-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2415:
---

  Component/s: client
 Target Version/s: 2.6.0
Affects Version/s: 2.5.0
   Issue Type: New Feature  (was: Bug)
  Summary: Expose MiniYARNCluster for use outside of YARN  (was: 
Mark MiniYARNCluster as Public Stable)

 Expose MiniYARNCluster for use outside of YARN
 --

 Key: YARN-2415
 URL: https://issues.apache.org/jira/browse/YARN-2415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.5.0
Reporter: Hari Shreedharan
Assignee: Karthik Kambatla

 The MR/HDFS equivalents are available for applications to use in tests, but 
 the YARN Mini cluster is not. It would be really useful to test applications 
 that are written to run on YARN (like Spark)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095017#comment-14095017
 ] 

Jian He commented on YARN-2308:
---

Reject an app that's at ACCEPTED state doesn't seem semantically right to me.
{code}
+.addTransition(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING,
+RMAppEventType.APP_REJECTED,
+new FinalSavingTransition(new AppRejectedTransition(), 
RMAppState.FAILED))
{code}

I think we should catch exception in following code and return Failed directly.
{code}
  // Add application to scheduler synchronously to guarantee scheduler
  // knows applications before AM or NM re-registers.
  app.scheduler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user, true));
{code}

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state

2014-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095032#comment-14095032
 ] 

Hudson commented on YARN-1370:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6055 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6055/])
YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav 
Dhoot via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 Fair scheduler to re-populate container allocation state
 

 Key: YARN-1370
 URL: https://issues.apache.org/jira/browse/YARN-1370
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Fix For: 2.6.0

 Attachments: YARN-1370.001.patch


 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095036#comment-14095036
 ] 

Zhijie Shen commented on YARN-2308:
---

Actually, app is rejected at AppAddedSchedulerEvent, but as I mentioned above 
AppAttemptAddedSchedulerEvent is scheduled regardless the app is added to CS or 
not. In fact, under the recover mode, RMApp will enter ACCEPTED regardless the 
app is added or not as well.

The thorough fix might be moving recovered APP to another state, and wait for 
the event from CS to move it ACCEPTED, and recover the attempts, including 
scheduling AppAttemptAddedSchedulerEvent. My feeling is that it is over-kill if 
we want to this single race condition. Thoughts?

bq.  I think set RM_WORK_PRESERVING_RECOVERY_ENABLED=true in test should be 
enough for this fix.

RM_WORK_PRESERVING_RECOVERY_ENABLED=true reflects the failure case in the 
description, but I'm wondering why RM_WORK_PRESERVING_RECOVERY_ENABLED=false, 
the test is going to fail. App will anyway be rejected, won't it?

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095039#comment-14095039
 ] 

Wangda Tan commented on YARN-2414:
--

Assigned it to myself, will post a patch soon

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 

[jira] [Assigned] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2414:


Assignee: Wangda Tan

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:116)
   at 
 

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095040#comment-14095040
 ] 

Jian He commented on YARN-2308:
---

bq. RMApp will enter ACCEPTED regardless the app is added or not as well.
That's what I meant. RMApp can choose to enter FAILED state directly and no 
need to add attempt any more.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095049#comment-14095049
 ] 

Jian He commented on YARN-1372:
---

bq. maybe we can just use NodeHeartbeatResponse#getContainersToCleanup to 
notify NM to remove the containers from the context. 
I was thinking to reuse getContainersToCleanup to notify NM to remove 
containers. i.e. do not clean up the containers until gets pulled by AM. On a 
second thought, this is not a good solution, please disregard this comment. 
sorry for confusion. 


Comments on the patch:
bq. Second patch uploaded that adds expiration to the entries in NM
How about tie the NM container lifecycle to the application itself ?  i.e. 
Clean all containers in context for each application in 
response.getApplicationsToCleanup(). 

- DECOMMISSIONED/LOST state possible to receive the new event?
- In RMAppAttemptImpl, why add a new previousJustFinishedContainers?
- RMContainerImpl already has the nodeId, many changes related to adding the 
NodeId are not needed. e.g. scheduler changes, RMContainerRecoverEvent changes.
- please add tests too.

 Ensure all completed containers are reported to the AMs across RM restart
 -

 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1372.prelim.patch, YARN-1372.prelim2.patch


 Currently the NM informs the RM about completed containers and then removes 
 those containers from the RM notification list. The RM passes on that 
 completed container information to the AM and the AM pulls this data. If the 
 RM dies before the AM pulls this data then the AM may not be able to get this 
 information again. To fix this, NM should maintain a separate list of such 
 completed container notifications sent to the RM. After the AM has pulled the 
 containers from the RM then the RM will inform the NM about it and the NM can 
 remove the completed container from the new list. Upon re-register with the 
 RM (after RM restart) the NM should send the entire list of completed 
 containers to the RM along with any other containers that completed while the 
 RM was dead. This ensures that the RM can inform the AM's about all completed 
 containers. Some container completions may be reported more than once since 
 the AM may have pulled the container but the RM may die before notifying the 
 NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095050#comment-14095050
 ] 

Wangda Tan commented on YARN-2308:
--

bq. I think we should catch exception in following code and return Failed 
directly.
Currently the CapacityScheduler will create AppRejectedEvent when found queue 
not existed while recovering or submit.
{code}
if (queue == null) {
  String message = Application  + applicationId + 
   submitted by user  + user +  to unknown queue:  + queueName;
  this.rmContext.getDispatcher().getEventHandler()
  .handle(new RMAppRejectedEvent(applicationId, message));
  return;
}
{code}
We cannot catch exception here, because now exception throw:
{code}
  // Add application to scheduler synchronously to guarantee scheduler
  // knows applications before AM or NM re-registers.
  app.scheduler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user, true));
{code}

bq. That's what I meant. RMApp can choose to enter FAILED state directly and no 
need to add attempt any more.
It will not add attempt here, because it will get rejected directly

bq. RM_WORK_PRESERVING_RECOVERY_ENABLED=true reflects the failure case in the 
description, but I'm wondering why RM_WORK_PRESERVING_RECOVERY_ENABLED=false, 
the test is going to fail. App will anyway be rejected, won't it?
I've tried this in my local again, it can get passed. Set 
RM_WORK_PRESERVING_RECOVERY_ENABLED=false is enough to cover what we want to 
verify.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095055#comment-14095055
 ] 

Wangda Tan commented on YARN-2308:
--

Typo:
bq. We cannot catch exception here, because now exception throw:
Should be We cannot catch exception here, because *no* exception throw:


 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095075#comment-14095075
 ] 

Hadoop QA commented on YARN-1198:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661339/YARN-1198.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4609//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4609//console

This message is automatically generated.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095092#comment-14095092
 ] 

Jian He commented on YARN-2308:
---

bq. We cannot catch exception here, because no exception throw
Though we can throw exception if isAppRecovering is true, that'll a bit 
complicates the logic. we can add comments to the newly added transition to 
explain the scenario.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)