date:20140813


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095261#comment-14095261
 ] 

Hadoop QA commented on YARN-1458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12616677/YARN-1458.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4610//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4610//console

This message is automatically generated.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095293#comment-14095293
 ] 

Varun Saxena commented on YARN-2136:


[~jianhe], what I was talking about, was the fact that certain events may be 
queued up in dispatcher event queue while this RM is in Master. But when we 
close the State Store while switching to standby(on fencing), we will wait till 
all these events have been processed (queue is drained). Marking the state 
store as FENCED may avoid these events being processed. 

That is, let us say I have 5 events in the queue and 3rd event leads to 
Fencing. Then, the last 2 events will also be processed before switch to 
standby completes.

Your views on need for handling of this case ? 

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2014-08-13 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-160:
--

Assignee: Varun Vasudev

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.5.0


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


 [ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2409:


Assignee: Rohith

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith

 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications


[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095381#comment-14095381
 ] 

Hudson commented on YARN-2317:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/644/])
YARN-2317. Updated the document about how to write YARN applications. 
Contributed by Li Lu. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm


 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state


[ 
https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095382#comment-14095382
 ] 

Hudson commented on YARN-1370:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/644/])
YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav 
Dhoot via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 Fair scheduler to re-populate container allocation state
 

 Key: YARN-1370
 URL: https://issues.apache.org/jira/browse/YARN-1370
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Fix For: 2.6.0

 Attachments: YARN-1370.001.patch


 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095378#comment-14095378
 ] 

Hudson commented on YARN-2373:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/644/])
YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing 
SSL passwords. Contributed by Larry McCay (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java


 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.6.0

 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()


 [ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-868:
--

Attachment: YARN-868.patch

 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095466#comment-14095466
 ] 

Rohith commented on YARN-2409:
--

I looked into issue (got logs from [~nishan] offline), there is problem in 
transitioning to standby. While resetting the dispatcher, new dispatcher has 
been created and assigned to existing dispatcher. But it is missed to stop 
current dispatcher which causes *1 Thread Leak i.e AsyncDispatcher event 
handler on each standby transition.*

InvalidStateTransitonException are due to processing of queued events from 
rmDispatcher. This is very corner scenario which would occure because of 
ActiveStandByElector call back and internal RM transitin to standby(I reproduce 
it)
Consider, RM1 has Active state. 
If RM transition to standby because of STATE_STORE_FENCED, it first *a) 
transition RM to standby* and *b) notify elector.*
At the same time, before notifying elector if elector ask RM to change 
Active(bcs of zk  unstable and gets rm lock first) then the following flow will 
occur
1. rm.transitionToStandby(true);  ---  From RMFatalEventDispatcher.handle()
2. AdminService.transitionToActive(); --- From Elector
3. rm.adminService.resetLeaderElection(); --- From 
RMFatalEventDispatcher.handle()

From above flow problem occure is , say App1--AppAttempt1 is launched and 
running. AppAttempt1 has put its status update events to rmDispatcher queue.
Say STATUS_UPDATE --- STATE_STORE_FENCED.
a) Above mentioned 3 flows ocured while processing STATE_STORE_FENCED. It 
means, RM has transitined to ACTIVE--STANDBY--ACTIVE recovering running 
application AppAttempt1.
b) Since rmDispather thread is not stopped while transitioning to Standby, it 
process STATUS_UPDATE(rmContext has already app recovered, so it will not be 
null) event which causes InvalidStateTransitonException.

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith

 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-08-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095471#comment-14095471
 ] 

Daryn Sharp commented on YARN-1915:
---

+1 But since I had a hand in the design, we should get a 2nd vote.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


 [ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2409:
-

Attachment: YARN-2409.patch

Attached the patch. Please review..

I have verified patch for
1. Thread Leak : Switching manually many times using admin command.
2. Queued up events are not processed once RM moves to StandBy.

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


 [ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2409:
-

Attachment: (was: YARN-2409.patch)

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


 [ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2409:
-

Attachment: YARN-2409.patch

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt


[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095507#comment-14095507
 ] 

Hudson commented on YARN-2399:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/])
YARN-2399. Delete old versions of files. FairScheduler: Merge AppSchedulable 
and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617619)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java
YARN-2399. FairScheduler: Merge AppSchedulable and FSSchedulerApp into 
FSAppAttempt. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617600)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FifoAppComparator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/NewAppWeightBooster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/WeightAdjuster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
*

[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications


[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095509#comment-14095509
 ] 

Hudson commented on YARN-2317:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/])
YARN-2317. Updated the document about how to write YARN applications. 
Contributed by Li Lu. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm


 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095506#comment-14095506
 ] 

Hudson commented on YARN-2373:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/])
YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing 
SSL passwords. Contributed by Larry McCay (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java


 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.6.0

 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state


[ 
https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095510#comment-14095510
 ] 

Hudson commented on YARN-1370:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/])
YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav 
Dhoot via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 Fair scheduler to re-populate container allocation state
 

 Key: YARN-1370
 URL: https://issues.apache.org/jira/browse/YARN-1370
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Fix For: 2.6.0

 Attachments: YARN-1370.001.patch


 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()


[ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095543#comment-14095543
 ] 

Hadoop QA commented on YARN-868:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661447/YARN-868.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4611//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4611//console

This message is automatically generated.

 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095577#comment-14095577
 ] 

Hadoop QA commented on YARN-2409:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661455/YARN-2409.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4612//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4612//console

This message is automatically generated.

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at

[jira] [Assigned] (YARN-2410) Nodemanager ShuffleHandler can easily exhaust file descriptors

2014-08-13 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reassigned YARN-2410:
-

Assignee: Chen He

 Nodemanager ShuffleHandler can easily exhaust file descriptors
 --

 Key: YARN-2410
 URL: https://issues.apache.org/jira/browse/YARN-2410
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Nathan Roberts
Assignee: Chen He
Priority: Critical

 The async nature of the shufflehandler can cause it to open a huge number of
 file descriptors, when it runs out it crashes.
 Scenario:
 Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
 Let's say all 6K reduces hit a node at about same time asking for their
 outputs. Each reducer will ask for all 40 map outputs over a single socket in 
 a
 single request (not necessarily all 40 at once, but with coalescing it is
 likely to be a large number).
 sendMapOutput() will open the file for random reading and then perform an 
 async transfer of the particular portion of this file(). This will 
 theoretically
 happen 6000*40=24 times which will run the NM out of file descriptors and 
 cause it to crash.
 The algorithm should be refactored a little to not open the fds until they're
 actually needed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt


[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095597#comment-14095597
 ] 

Hudson commented on YARN-2399:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/])
YARN-2399. Delete old versions of files. FairScheduler: Merge AppSchedulable 
and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617619)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java
YARN-2399. FairScheduler: Merge AppSchedulable and FSSchedulerApp into 
FSAppAttempt. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617600)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FifoAppComparator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/NewAppWeightBooster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/WeightAdjuster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
*

[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications


[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095599#comment-14095599
 ] 

Hudson commented on YARN-2317:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/])
YARN-2317. Updated the document about how to write YARN applications. 
Contributed by Li Lu. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm


 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state


[ 
https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095600#comment-14095600
 ] 

Hudson commented on YARN-1370:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/])
YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav 
Dhoot via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 Fair scheduler to re-populate container allocation state
 

 Key: YARN-1370
 URL: https://issues.apache.org/jira/browse/YARN-1370
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Fix For: 2.6.0

 Attachments: YARN-1370.001.patch


 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095596#comment-14095596
 ] 

Hudson commented on YARN-2373:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/])
YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing 
SSL passwords. Contributed by Larry McCay (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java


 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.6.0

 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()


[ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095627#comment-14095627
 ] 

Varun Saxena commented on YARN-868:
---

Changes made are as under :
1. Provided an additional interface Token#getServiceAddress() to get Service 
address from token.
2. Stored server address(RM's) in ApplicationClientProtocolPBClientImpl. This 
address is set in TokenPBImpl while the GetDelegationResponse is being prepared 
from ResponseProto. This would let us store the address of RM which served this 
get delegation token request.
3. As service address field exists in TokenPBImpl but not in TokenProto, made 
suitable changes in tests.


 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()


[ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095628#comment-14095628
 ] 

Varun Saxena commented on YARN-868:
---

Another approach to resolve it could have been as under :
1. Make changes in TokenProto as well and return service address from 
Server(RM). But as TokenProto is defined in hadoop-common and used across 
projects, didnt make this change.  
2. Token is also part of requests, ideally we should have a separate class for 
Response flow, containing Token and Service Address. But this would break the 
interface YarnClient#getRMDelegationToken, hence didnt make this change either.

 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-08-13 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095671#comment-14095671
]

Hitesh Shah commented on YARN-1915:
---

[~jlowe] [~daryn] Was there a reason for not sending in the secret to the AM
via its env when it launches? I am assuming this is to not have all the AMs to
change to handle this? Wouldn't that be a more effective solution as compared
to use of a timer ( which in practice would work ) but is still reliant upon
the AM receiving the secret from the RM within the time window before the
client does.

ClientToAMTokenMasterKey should be provided to AM at launch time

Key: YARN-1915
URL: https://issues.apache.org/jira/browse/YARN-1915
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
Attachments: YARN-1915.patch, YARN-1915v2.patch

Currently, the AM receives the key as part of registration. This introduces a
race where a client can connect to the AM when the AM has not received the
key.
Current Flow:
1) AM needs to start the client listening service in order to get host:port
and send it to the RM as part of registration
2) RM gets the port info in register() and transitions the app to RUNNING.
Responds back with client secret to AM.
3) User asks RM for client token. Gets it and pings the AM. AM hasn't
received client secret from RM and so RPC itself rejects the request.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery


[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095686#comment-14095686
 ] 

Hadoop QA commented on YARN-2409:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661455/YARN-2409.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4613//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4613//console

This message is automatically generated.

 InvalidStateTransitonException in ResourceManager after job recovery
 

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue

2014-08-13 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095718#comment-14095718
 ] 

Sunil G commented on YARN-2385:
---

I am not sure whether we need to check about completed apps for CS and Fair?
Only in this case of having completed apps, we have problem of evicting apps 
beyond a limit as mentioned by [~wangda] in YARN-807.

May be two separate apis (getRunningAppsInQueue, getPendingAppsInQueue) with 
common behavior across CS/Fair could be a better approach.

 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This JIRA proposes adding a method in AbstractYarnScheduler to get all the 
 pending/active applications. Fair scheduler already supports moving a single 
 application from one queue to another. Support for the same is being added to 
 Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition 
 of this method, we can transparently add support for moving all applications 
 from source queue to target queue and draining a queue, i.e. killing all 
 applications in a queue as proposed by YARN-2389



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support application-acls

2014-08-13 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095725#comment-14095725
 ] 

Sunil G commented on YARN-2390:
---

For getting application report, container report etc, currently in 
ClientRMService Queue ACL for ADMINISTER_QUEUE is also checked.

I think for these reports, same will be applicable in HistoryService also. 

 Investigating whehther generic history service needs to support 
 application-acls
 

 Key: YARN-2390
 URL: https://issues.apache.org/jira/browse/YARN-2390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 According YARN-1250,  it's arguable whether queue-acls should be applied to 
 the generic history service as well, because the queue admin may not need the 
 access to the completed application that is removed from the queue. Create 
 this ticket to tackle the discussion around.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes


[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095728#comment-14095728
 ] 

Xuan Gong commented on YARN-2398:
-

That is strange. For all the Test*OnHA, we are using MiniYarnCluster which is 
automatically register all the handlers for us when this RM become active.

 TestResourceTrackerOnHA crashes
 ---

 Key: YARN-2398
 URL: https://issues.apache.org/jira/browse/YARN-2398
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jason Lowe

 TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2390) Investigating whehther generic history service needs to support queue-acls


 [ 
https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2390:
--

Summary: Investigating whehther generic history service needs to support 
queue-acls  (was: Investigating whehther generic history service needs to 
support application-acls)

 Investigating whehther generic history service needs to support queue-acls
 --

 Key: YARN-2390
 URL: https://issues.apache.org/jira/browse/YARN-2390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 According YARN-1250,  it's arguable whether queue-acls should be applied to 
 the generic history service as well, because the queue admin may not need the 
 access to the completed application that is removed from the queue. Create 
 this ticket to tackle the discussion around.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-13 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095734#comment-14095734
 ] 

Sunil G commented on YARN-2136:
---

bq.we will wait till all these events have been processed

Will it be little long to wait at this stage to become standby?

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-08-13 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095735#comment-14095735
]

Jason Lowe commented on YARN-1915:
--

Good question, Hitesh. I don't know exactly why that was changed, as it was
sending it either via the env or via the credentials for the container in 0.23.
I assumed there was a reason that wasn't OK given that it was explicitly
changed to not do that, but that may be a bad assumption.

Digging a bit, found this was changed in YARN-610. Apparently on Windows the
env isn't secure and secrets can be gleaned from it. The JIRA also claims the
key can't go in the container credentials, but it doesn't elaborate. If a
container's credentials also aren't secure then it seems to me we have bigger
problems than just this key.

ClientToAMTokenMasterKey should be provided to AM at launch time

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support queue-acls


[ 
https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095748#comment-14095748
 ] 

Zhijie Shen commented on YARN-2390:
---

bq. For getting application report, container report etc, currently in 
ClientRMService Queue ACL for ADMINISTER_QUEUE is also checked.

That's correct. However, after the app is finished, it has been removed from 
the queue. The question is whether we still want to give queue admin to the app 
that used to run on the queue, but now is removed from it and finished.

Personally, I prefer not to grant the view access of the finished app to the 
queue admin, because IMHO, the permissions of the queue admin should be within 
the scope of his assigned queue. Thoughts?

 Investigating whehther generic history service needs to support queue-acls
 --

 Key: YARN-2390
 URL: https://issues.apache.org/jira/browse/YARN-2390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 According YARN-1250,  it's arguable whether queue-acls should be applied to 
 the generic history service as well, because the queue admin may not need the 
 access to the completed application that is removed from the queue. Create 
 this ticket to tackle the discussion around.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose


[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095808#comment-14095808
 ] 

Hadoop QA commented on YARN-2356:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657814/Yarn-2356.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4614//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4614//console

This message is automatically generated.

 yarn status command for non-existent application/application 
 attempt/container is too verbose 
 --

 Key: YARN-2356
 URL: https://issues.apache.org/jira/browse/YARN-2356
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: Yarn-2356.1.patch


 *yarn application -status* or *applicationattempt -status* or *container 
 status* commands can suppress exception such as ApplicationNotFound, 
 ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
 RM or History Server. 
 For example, below exception can be suppressed better
 sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status 
 application_1402668848165_0015
 No GC_PROFILE is given. Defaults to medium.
 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
 /10.18.40.77:45022
 Exception in thread main 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1402668848165_0015' doesn't exist in RM.
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
 at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-08-13 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095820#comment-14095820
 ] 

Daryn Sharp commented on YARN-1915:
---

I suspect it's because it removes the burden from the AM to strip the secret 
from the credentials so it doesn't leak to other processes.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-13 Thread Arun C Murthy (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095825#comment-14095825
]

Arun C Murthy commented on YARN-2411:
-

Looks good to me. Hopefully this should be a simple enhancement in CS.

[Capacity Scheduler] support simple user and group mappings to queues
-

Key: YARN-2411
URL: https://issues.apache.org/jira/browse/YARN-2411
Project: Hadoop YARN
Issue Type: Improvement
Components: capacityscheduler
Reporter: Ram Venkatesh

YARN-2257 has a proposal to extend and share the queue placement rules for
the fair scheduler and the capacity scheduler. This is a good long term
solution to streamline queue placement of both schedulers but it has core
infra work that has to happen first and might require changes to current
features in all schedulers along with corresponding configuration changes, if
any.
I would like to propose a change with a smaller scope in the capacity
scheduler that addresses the core use cases for implicitly mapping jobs that
have the default queue or no queue specified to specific queues based on the
submitting user and user groups. It will be useful in a number of real-world
scenarios and can be migrated over to the unified scheme when YARN-2257
becomes available.
The proposal is to add two new configuration options:
yarn.scheduler.capacity.queue-mappings.enable
A boolean that controls if queue mappings are enabled, default is false.
and,
yarn.scheduler.capacity.queue-mappings
A string that specifies a list of mappings in the following format:
map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
map_specifier := user (u) | group (g)
source_attribute := user | group | %user
queue_name := the name of the mapped queue | %user | %primary_group
The mappings will be evaluated left to right, and the first valid mapping
will be used. If the mapped queue does not exist, or the current user does
not have permissions to submit jobs to the mapped queue, the submission will
fail.
Example usages:
1. user1 is mapped to queue1, group1 is mapped to queue2
u:user1:queue1,g:group1:queue2
2. To map users to queues with the same name as the user:
u:%user:%user
I am happy to volunteer to take this up.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can easily exhaust file descriptors

2014-08-13 Thread jay vyas (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095824#comment-14095824
]

jay vyas commented on YARN-2410:

Adding this as a part of BIGTOP-1403 also. hopefully we can have some smoke
tests which induce this bug.

If you have a code snippet or patch to contribute that generates this , feel
free to attach it there, we can help you to refine it.

Nodemanager ShuffleHandler can easily exhaust file descriptors
--

Key: YARN-2410
URL: https://issues.apache.org/jira/browse/YARN-2410
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.5.0
Reporter: Nathan Roberts
Assignee: Chen He
Priority: Critical

The async nature of the shufflehandler can cause it to open a huge number of
file descriptors, when it runs out it crashes.
Scenario:
Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
Let's say all 6K reduces hit a node at about same time asking for their
outputs. Each reducer will ask for all 40 map outputs over a single socket in
a
single request (not necessarily all 40 at once, but with coalescing it is
likely to be a large number).
sendMapOutput() will open the file for random reading and then perform an
async transfer of the particular portion of this file(). This will
theoretically
happen 6000*40=24 times which will run the NM out of file descriptors and
cause it to crash.
The algorithm should be refactored a little to not open the fds until they're
actually needed.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-13 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-2411:


Assignee: Ram Venkatesh

 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh
Assignee: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings.enable 
 A boolean that controls if queue mappings are enabled, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format:
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095885#comment-14095885
 ] 

Jian He commented on YARN-2378:
---

Noticed that SchedulerApplication#setQueue should be updated as well when move 
happens. (can you add test for this too)

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2102) More generalized timeline ACLs


 [ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2102:
--

Attachment: GeneralizedTimelineACLs.pdf

I uploaded a proposal of supporting more generalized ACLs for timeline data 
model.

 More generalized timeline ACLs
 --

 Key: YARN-2102
 URL: https://issues.apache.org/jira/browse/YARN-2102
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: GeneralizedTimelineACLs.pdf


 We need to differentiate the access controls of reading and writing 
 operations, and we need to think about cross-entity access control. For 
 example, if we are executing a workflow of MR jobs, which writing the 
 timeline data of this workflow, we don't want other user to pollute the 
 timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2056) Disable preemption at Queue level

2014-08-13 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-2056:


Assignee: Eric Payne

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne

 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API


 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2277:
--

Attachment: YARN-2277-v8.patch

[~jeagles], thanks for the latest patch. +1, LGTM.

I uploaded a new patch with some minor touch:

1. Changing http.cross-origin to http-cross-origin;
2. Breaking the lines that go beyond 80 chars.

Will commit the patch once jenkins +1.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, 
 YARN-2277-v7.patch, YARN-2277-v8.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096036#comment-14096036
 ] 

Hadoop QA commented on YARN-2277:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661510/YARN-2277-v8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4615//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4615//console

This message is automatically generated.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, 
 YARN-2277-v7.patch, YARN-2277-v8.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken


 [ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2383:


Attachment: YARN-2383.preview.1.patch

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken


[ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096060#comment-14096060
 ] 

Xuan Gong commented on YARN-2383:
-

Make the ClientToAMToken renewable after a configurable period of time. AM will 
send the renewDate in allocate call. RM can send back the new renewDate if 
needed thru the allocate.
At AM side, create a ClientToAMTokenCache ,which is singleton. The  
ClientToAMTokenCache is used to save the master-key and renewDate of the 
ClientToAMToken, and the ClientToAMTokenCache will be used in 
ClientToAMTokenSecretManager in AM side to help save and retrieve the 
master-key. With using ClientToAMTokenCache, we do not need to change any 
public API and codes in both AM and client.

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096117#comment-14096117
 ] 

Hudson commented on YARN-2277:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6060 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6060/])
YARN-2277. Added cross-origin support for the timeline server web services. 
Contributed by Jonathan Eagles. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617832)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java


 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Fix For: 2.6.0

 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, 
 YARN-2277-v7.patch, YARN-2277-v8.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-13 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096127#comment-14096127
 ] 

Jonathan Eagles commented on YARN-2277:
---

Thanks for the great reviews, [~zjshen]!

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Fix For: 2.6.0

 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, 
 YARN-2277-v7.patch, YARN-2277-v8.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096160#comment-14096160
 ] 

Varun Saxena commented on YARN-2136:


[~sunilg], on closer look at the code, when we close the RMStateStore, we close 
the ZKClients as well. Hence dispatcher queue draining shouldn't matter as 
ZKClient is already closed. 
Ofcourse as more than one thread is involved, AsyncDispatcher's event handling 
thread may pick up an event and process it(store/update operation) before RM 
can close RMStateStore(while switching to standby). 

But in my view, this shouldn't be too big an impact I think to warrant adding a 
FENCED state.


 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-08-13 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096176#comment-14096176
]

Jason Lowe commented on YARN-2331:
--

Ideally the process is self-contained on the NM node so once it has shutdown
without killing containers it can be immediately restarted on the new release
to minimize the period where the NM is not responding. I suppose we could have
the the shutdown/upgrade script on the NM issue the rmadmin command then wait
for the NM to receive the RM command and exit.

I think it would be cleaner if we didn't have to involve the RM. However I
don't feel so strongly that I'd object if we can't find a nice way to do this
with just the NM node.

Distinguish shutdown during supervision vs. shutdown for rolling upgrade

Key: YARN-2331
URL: https://issues.apache.org/jira/browse/YARN-2331
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

When the NM is shutting down with restart support enabled there are scenarios
we'd like to distinguish and behave accordingly:
# The NM is running under supervision. In that case containers should be
preserved so the automatic restart can recover them.
# The NM is not running under supervision and a rolling upgrade is not being
performed. In that case the shutdown should kill all containers since it is
unlikely the NM will be restarted in a timely manner to recover them.
# The NM is not running under supervision and a rolling upgrade is being
performed. In that case the shutdown should not kill all containers since a
restart is imminent due to the rolling upgrade and the containers will be
recovered.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server

2014-08-13 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096212#comment-14096212
 ] 

Hudson commented on YARN-2070:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6061 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6061/])
YARN-2070. Made DistributedShell publish the short user name to the timeline 
server. Contributed by Robert Kanter. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


 DistributedShell publishes unfriendly user information to the timeline server
 -

 Key: YARN-2070
 URL: https://issues.apache.org/jira/browse/YARN-2070
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2070.patch


 Bellow is the code of using the string of current user object as the user 
 value.
 {code}
 entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser()
 .toString());
 {code}
 When we use kerberos authentication, it's going to output the full name, such 
 as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for 
 searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2378:
---

Attachment: YARN-2378.patch

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-13 Thread Subramaniam Venkatraman Krishnan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096230#comment-14096230
]

Subramaniam Venkatraman Krishnan commented on YARN-2378:

[~jianhe], thanks for your detailed comments. I am attaching a patch that
addresses it.

Summary of changes:
* getCheckLeafQueue renamed to getAndCheckLeafQueue
* Add test case to move app to a sibling queue within the same parent
* Removed unnecessary unreserve as we need to only the metrics which is
already being done in SchedulerApplicationAttempt as you right pointed out.
* Removed unwanted synchronized block
* Updated SchedulerApplication#setQueue (nice catch). Added checks in test
case. Also updated AppSchedulingInfo as it also maintains the queue name.

Adding support for moving apps between queues in Capacity Scheduler
---

Key: YARN-2378
URL: https://issues.apache.org/jira/browse/YARN-2378
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
Labels: capacity-scheduler
Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch,
YARN-2378.patch

As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707
to smaller patches for manageability. This JIRA will address adding support
for moving apps between queues in Capacity Scheduler.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-08-13 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096239#comment-14096239
]

Hitesh Shah commented on YARN-1915:
---

[~daryn] I believe all AMs already have to remove the AMRMToken from their
credentials before invoking other processes.

[~jlowe] I am not too familiar with this but would it be possible to consider
an option where the AMRMToken is used to encrypt/decrypt the
ClienttoAMMasterKey? The encrypted key could be then base64 encoded and then
sent to the AM via the env?

ClientToAMTokenMasterKey should be provided to AM at launch time

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time

2014-08-13 Thread Jian Fang (JIRA)

Jian Fang created YARN-2416:
---

 Summary: InvalidStateTransitonException in ResourceManager if 
AMLauncher does not receive response for startContainers() call in time
 Key: YARN-2416
 URL: https://issues.apache.org/jira/browse/YARN-2416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jian Fang


AMLauncher calls startContainers(allRequests) to launch a container for 
application master. Normally, the call comes back immediately so that the 
RMAppAttempt changes its state from ALLOCATED to LAUNCHED. 

However, we do observed that in some cases, the RPC call came back very late 
but the AM container was already started. Because the RMAppAttempt stuck in 
ALLOCATED state, once resource manager received the REGISTERED event from the 
application master, it threw InvalidStateTransitonException as follows.

2014-07-05 08:59:05,021 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
REGISTERED at ALLOCATED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)

For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, 
resource manager kept throwing InvalidStateTransitonException.

2014-07-05 08:59:06,152 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
STATUS_UPDATE at ALLOCATED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
2014-07-05 08:59:07,779 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1404549222428_0001_02_02 Container Transitioned from NEW to
 ALLOCATED
2014-07-05 08:59:07,779 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_ALLOCATED at ALLOCATED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
at

[jira] [Commented] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time

2014-08-13 Thread Jian Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096250#comment-14096250
 ] 

Jian Fang commented on YARN-2416:
-

Here is the log for the events and state transitions.

2014-07-05 08:57:41,748 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1404549222428_0001_01_01 Container Transitioned from NEW to 
ALLOCATED

2014-07-05 08:57:41,760 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1404549222428_0001_01_01 Container Transitioned from ALLOCATED to 
ACQUIRED

2014-07-05 08:57:41,833 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to 
launch container container_1404549222428_0001_01_01 : $JAVA_HOME/bin/java 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=LOG_DIR -Dyarn.app.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA -Xmx2048m 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1LOG_DIR/stdout 
2LOG_DIR/stderr


2014-07-05 08:57:42,737 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1404549222428_0001_01_01 Container Transitioned from ACQUIRED to 
RUNNING
2014-07-05 08:58:54,290 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1404549222428_0001_01_01 Container Transitioned from RUNNING to 
KILLED

2014-07-05 08:58:54,290 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Completed container: container_1404549222428_0001_01_01 in state: KILLED 
event:KILL

2014-07-05 08:58:54,394 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1404549222428_0001_02 State change from SCHEDULED to 
ALLOCATED_SAVING
2014-07-05 08:58:54,394 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1404549222428_0001_02 State change from ALLOCATED_SAVING to 
ALLOCATED
2014-07-05 08:58:54,395 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching 
masterappattempt_1404549222428_0001_02

2014-07-05 08:58:54,397 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to 
launch container container_1404549222428_0001_02_01 : $JAVA_HOME/bin/java 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=LOG_DIR -Dyarn.app.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA  -Xmx2048m 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1LOG_DIR/stdout 
2LOG_DIR/stderr 

2014-07-05 08:58:55,396 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1404549222428_0001_02_01 Container Transitioned from ACQUIRED to 
RUNNING

2014-07-05 08:59:05,020 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM 
registration appattempt_1404549222428_0001_02
2014-07-05 08:59:05,021 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop   
IP=10.198.10.201OPERATION=Register App Master   
TARGET=ApplicationMasterService RESULT=SUCCESS  
APPID=application_1404549222428_0001
APPATTEMPTID=appattempt_1404549222428_0001_02

2014-07-05 08:59:12,653 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done 
launching container Container: [ContainerId: 
container_1404549222428_0001_01_01, NodeId: 
ip-10-198-22-164.us-west-1.compute.internal:9103, NodeHttpAddress: 
ip-10-198-22-164.us-west-1.compute.internal:9035, Resource: memory:3378, 
vCores:1, Priority: 0, Token: Token { kind: ContainerToken, service: 
10.198.22.164:9103 }, ] for AM appattempt_1404549222428_0001_01
2014-07-05 08:59:12,653 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done 
launching container Container: [ContainerId: 
container_1404549222428_0001_02_01, NodeId: 
ip-10-198-10-201.us-west-1.compute.internal:9103, NodeHttpAddress: 
ip-10-198-10-201.us-west-1.compute.internal:9035, Resource: memory:3378, 
vCores:1, Priority: 0, Token: Token { kind: ContainerToken, service: 
10.198.10.201:9103 }, ] for AM appattempt_1404549222428_0001_02

 InvalidStateTransitonException in ResourceManager if AMLauncher does not 
 receive response for startContainers() call in time
 

 Key: YARN-2416
 URL: https://issues.apache.org/jira/browse/YARN-2416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jian Fang

 AMLauncher calls startContainers(allRequests) to launch a container for 
 application master. Normally, the call comes back immediately so that the 
 RMAppAttempt

[jira] [Commented] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time

2014-08-13 Thread Jian Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096256#comment-14096256
 ] 

Jian Fang commented on YARN-2416:
-

Seems for some reason, the RPC call for startContainers(allRequests) in 
AMLauncher was blocked. More interesting thing is that the second retry to 
launch the application master failed for the same reason and the responses of 
the two startContainers() calls for both AM launches came back at the same time 
as shown in the above log.

Since the REGISTERED event is a good indication that the AM container was 
launched successfully, can we add state transition from ALLOCATED to RUNNING or 
do two state transitions, i.e., from ALLOCATED to LAUNCHED and then from 
LAUNCHED to RUNNING, in this case?

 InvalidStateTransitonException in ResourceManager if AMLauncher does not 
 receive response for startContainers() call in time
 

 Key: YARN-2416
 URL: https://issues.apache.org/jira/browse/YARN-2416
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jian Fang

 AMLauncher calls startContainers(allRequests) to launch a container for 
 application master. Normally, the call comes back immediately so that the 
 RMAppAttempt changes its state from ALLOCATED to LAUNCHED. 
 However, we do observed that in some cases, the RPC call came back very late 
 but the AM container was already started. Because the RMAppAttempt stuck in 
 ALLOCATED state, once resource manager received the REGISTERED event from the 
 application master, it threw InvalidStateTransitonException as follows.
 2014-07-05 08:59:05,021 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 REGISTERED at ALLOCATED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, 
 resource manager kept throwing InvalidStateTransitonException.
 2014-07-05 08:59:06,152 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at ALLOCATED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 2014-07-05 08:59:07,779 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1404549222428_0001_02_02 Container Transitioned from NEW to
  ALLOCATED
 2014-07-05 08:59:07,779 ERROR

[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken


 [ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2383:


Attachment: YARN-2383.preview.2.patch

fix testcase failures

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2417) Sorting by total memory doesn't work correctly

Siqi Li created YARN-2417:
-

 Summary: Sorting by total memory doesn't work correctly
 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2417) Sorting by total memory doesn't work correctly


 [ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned YARN-2417:
-

Assignee: Siqi Li

 Sorting by total memory doesn't work correctly
 

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2417) Sorting by total memory doesn't work correctly


 [ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2417:
--

Affects Version/s: 2.4.0

 Sorting by total memory doesn't work correctly
 

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: _thumb_151509.png






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096307#comment-14096307
 ] 

Hadoop QA commented on YARN-415:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661548/YARN-415.201408132109.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.cli.TestYarnCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4617//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4617//console

This message is automatically generated.

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2417) Sorting by total memory doesn't work correctly


 [ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2417:
--

Attachment: YARN-2417.v1.patch

 Sorting by total memory doesn't work correctly
 

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2417.v1.patch, _thumb_151509.png






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2417) Adding total Memory into AppsBlocks


 [ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2417:
--

Summary: Adding total Memory into AppsBlocks  (was: Sorting by total 
memory doesn't work correctly)

 Adding total Memory into AppsBlocks
 ---

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2417.v1.patch, _thumb_151509.png






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-08-13 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096319#comment-14096319
]

Daryn Sharp commented on YARN-1915:
---

Yes, I thought the ugi mangling was gone, but the AMRMToken is indeed manually
removed. I'm assuming there was a valid reason why the secret is passed in the
registration response, perhaps for future functionality.

Rather than second guess how/why it's done this way, I'd prefer to focus on a
small immediate fix for this very tight race condition. The AM should
generally receive the registration response before a client can ask the RM
where the AM is and try to connect. Could we file another jira to contemplate
an incompatible change?

ClientToAMTokenMasterKey should be provided to AM at launch time

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue

2014-08-13 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096329#comment-14096329
 ] 

Wangda Tan commented on YARN-2385:
--

I think we might not need maintain completed apps CS and Fair after thought 
about it. Maintain such fields is not original responsibility designed for 
scheduler.
And for now, user can get completed container via REST API, that should be able 
to cover most use cases. 


 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This JIRA proposes adding a method in AbstractYarnScheduler to get all the 
 pending/active applications. Fair scheduler already supports moving a single 
 application from one queue to another. Support for the same is being added to 
 Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition 
 of this method, we can transparently add support for moving all applications 
 from source queue to target queue and draining a queue, i.e. killing all 
 applications in a queue as proposed by YARN-2389



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096351#comment-14096351
 ] 

Hadoop QA commented on YARN-2378:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661564/YARN-2378.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4618//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4618//console

This message is automatically generated.

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2417) Adding total Memory into AppsBlocks

2014-08-13 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096360#comment-14096360
 ] 

Jason Lowe commented on YARN-2417:
--

Is this a dup of YARN-451?

 Adding total Memory into AppsBlocks
 ---

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2417.v1.patch, _thumb_151509.png






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2417) Adding total Memory into AppsBlocks


[ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096374#comment-14096374
 ] 

Siqi Li commented on YARN-2417:
---

I think so, but that one didn't get committed yet.

 Adding total Memory into AppsBlocks
 ---

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2417.v1.patch, _thumb_151509.png






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2417) Adding total Memory into AppsBlocks


[ 
https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096376#comment-14096376
 ] 

Hadoop QA commented on YARN-2417:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661588/YARN-2417.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4619//console

This message is automatically generated.

 Adding total Memory into AppsBlocks
 ---

 Key: YARN-2417
 URL: https://issues.apache.org/jira/browse/YARN-2417
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2417.v1.patch, _thumb_151509.png






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-08-13 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1506:
-

Attachment: YARN-1506-v11.patch

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
 YARN-1506-v11.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
 YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
 YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-08-13 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096407#comment-14096407
]

Junping Du commented on YARN-1506:
--

Thanks [~bikassaha] and [~jianhe] for review and comments!
bq. At this point the previously accepted 4GB request cannot be satisfied and
the application will get stuck. We may need to follow this up in a different
jira.
Agree. Agree to discuss this in a separate JIRA. Improve unit tests as your
previous comments.

bq. What is more important is that having a shared method declares that these
pieces of code are related to each other in a logical way (if such a relation
exists).
Done. Abstract to a common method.

bq. Since it’s changed to asynchronous, we may change the log to not say
successfully.
Removed.

bq. IMO, since UpdateNodeResourceWhenUnusableTransition and
UpdateNodeResourceWhenNonRunningTransition are the same except one extra
logging, we can do the logging for both and just keep one transition?
Agree. Merged.

bq. if possible, nodeResourceUpdate method can be moved into
AbstractYarnScheduler, a new common base class for sharing common code among
all the schedulers.
That's good point. Move to AbstractYarnScheduler now.

bq. SchedulerNode.setTotalResource -
SchedulerNode.updateTotalAndAvailableResource() ?
My current thinking is better to keep it as setTotalResource, as the later one
may mistake people that we can set availableResource as a separate value in
this method. Actually, we only set totalResource and availableResource is just
get refresh to keep consistent with totalResource change. May be we can keep it
here?

bq. UpdateNodeResourceResponse should be an abstract class which implements
newInstance() method.
Fixed.

bq. AdminService.updateNodeResource should RMAuditLogger to log the operations
as well.
Fixed.

Replace set resource change on RMNode/SchedulerNode directly with event
notification.
-

Key: YARN-1506
URL: https://issues.apache.org/jira/browse/YARN-1506
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch,
YARN-1506-v11.patch, YARN-1506-v2.patch, YARN-1506-v3.patch,
YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch,
YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch

According to Vinod's comments on YARN-312
(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
we should replace RMNode.setResourceOption() with some resource change event.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken


[ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096465#comment-14096465
 ] 

Hadoop QA commented on YARN-2383:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661575/YARN-2383.preview.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 8 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest
  
org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4620//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4620//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4620//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4620//console

This message is automatically generated.

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1584) Support explicit failover when automatic failover is enabled

2014-08-13 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-1584.


  Resolution: Won't Fix
Target Version/s:   (was: )

Marking this as Won't Fix as we didn't see a particular need for it on RM HA 
deployments. 

We can re-open this if need be. 

 Support explicit failover when automatic failover is enabled
 

 Key: YARN-1584
 URL: https://issues.apache.org/jira/browse/YARN-1584
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 YARN-1029 adds automatic failover support. However, users can't explicitly 
 ask for a failover from one RM to the other without stopping the other RM. 
 Stopping the RM until the other RM takes over and then restarting the first 
 RM is more involving and exposes the RM-ensemble to SPOF for a longer 
 duration. 
 It would be nice to allow explicit failover through yarn rmadmin -failover 
 command.
 PS: HDFS supports -failover option. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1602) All failed RMStateStore operations should not be RMFatalEvents

2014-08-13 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-1602.


Resolution: Won't Fix

RM HA deployments have been working fairly well. Let us close this as Won't 
Fix for now and revisit it if need be. 

 All failed RMStateStore operations should not be RMFatalEvents
 --

 Key: YARN-1602
 URL: https://issues.apache.org/jira/browse/YARN-1602
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical

 Currently, if a state store operation fails, depending on the exception, 
 either a RMFatalEvent.STATE_STORE_FENCED or 
 RMFatalEvent.STATE_STORE_OP_FAILED events are created. The latter results in 
 the RM failing. Instead, we should probably kill the application 
 corresponding to the store operation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2415) Expose MiniYARNCluster for use outside of YARN

2014-08-13 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096476#comment-14096476
 ] 

Karthik Kambatla commented on YARN-2415:


Yes. MiniYARNCluster today is marked Public-Evolving. We need to make it 
Public-Stable. 

Before we do that, I think we should hide most of its methods and the 
constructors behind @Private annotations, and add a couple of @Public static 
methods. 

 Expose MiniYARNCluster for use outside of YARN
 --

 Key: YARN-2415
 URL: https://issues.apache.org/jira/browse/YARN-2415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.5.0
Reporter: Hari Shreedharan
Assignee: Karthik Kambatla

 The MR/HDFS equivalents are available for applications to use in tests, but 
 the YARN Mini cluster is not. It would be really useful to test applications 
 that are written to run on YARN (like Spark)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096494#comment-14096494
 ] 

Hadoop QA commented on YARN-1506:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661608/YARN-1506-v11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.sls.nodemanager.TestNMSimulator
  org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator
  org.apache.hadoop.yarn.sls.TestSLSRunner
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMNodeTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4621//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4621//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4621//console

This message is automatically generated.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
 YARN-1506-v11.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
 YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
 YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores

2014-08-13 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096533#comment-14096533
 ] 

Chen He commented on YARN-2413:
---

IMHO, we may need to include vcore into headroom calculation also if we turn on 
the vcore as type of resource.

 capacity scheduler will overallocate vcores
 ---

 Key: YARN-2413
 URL: https://issues.apache.org/jira/browse/YARN-2413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, scheduler
Affects Versions: 3.0.0, 2.2.0
Reporter: Allen Wittenauer
Priority: Critical

 It doesn't appear that the capacity scheduler is properly allocating vcores 
 when making scheduling decisions, which may result in overallocation of CPU 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken


 [ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2383:


Attachment: YARN-2383.preview.3.patch

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, 
 YARN-2383.preview.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

[
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096592#comment-14096592
]

Zhijie Shen commented on YARN-2033:
---

[~djp], thanks for raising this question explicitly. Here're two points I'd
like to highlight for this work:

1. This patch doesn't intend to remove the existing FS based history store, but
deprecate it by removing the default configs about loading FS based history
store. On the other hand, the patch adds the history store that rides the
timeline store, and use it as the default. Given the user who is the early
adopter of the generic history service wants to continue with FS based history
store, he needs to set the old configs explicitly (actually he should have done
it because by default the generic history service is not enabled), and the new
generic history service is still going to horner old configs for backward
compatibility.

2. Though the generic history service (previously we call it application
history server) is introduced to Hadoop since 2.4, but it is not production
ready. We have explicitly highlighted it in the
[documentation|http://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Current_Status].
I agree it seems to be a bit aggressive to move from FS based history store to
timeline store based one as the default, however, I'm afraid it's the best
choice at the current stage, because FS based history store has several
critical limitations: no caching, no retention, not scalable and not supporting
the secure mode. Unless we're able to solve all these problems (obviously we
don't have the bandwidth to do it now), it's risky to use FS based history
store as the default, in particular when the timeline server is going to be
production ready. On the other side, the aforementioned limitations have
already been addressed by the timeline store (scalability will be ensured by
HBase timeline store). Hence timeline store based history store should be a
more reasonable and reliable default of new users.

Investigate merging generic-history into the Timeline Store
---

Key: YARN-2033
URL: https://issues.apache.org/jira/browse/YARN-2033
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf,
YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch,
YARN-2033.5.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch,
YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch

Having two different stores isn't amicable to generic insights on what's
happening with applications. This is to investigate porting generic-history
into the Timeline Store.
One goal is to try and retain most of the client side interfaces as close to
what we have today.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096599#comment-14096599
 ] 

Jian He commented on YARN-2378:
---

patch looks good overall,  just one thing that we can move 
metrics.submitApp(userName); to addApplication() so that we can avoid 
changing the method signature to include the isMove flag.

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.


 [ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2409:
-

Priority: Critical  (was: Major)

 Active to StandBy transition does not stop rmDispatcher that causes 1 
 AsyncDispatcher thread leak. 
 ---

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.