[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095223#comment-14095223 ] George Wong commented on YARN-1458: --- Guys, What is the status of this issue now? We met this issue in our production cluster recently. Now, we have to disable sizebasedweight. This patch looks good to me. But there is 1 core UT failure. And now, jenkins has rotated the test results. Can we trigger the test again and see what is wrong? In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Fix For: 2.2.1 Attachments: YARN-1458.patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095261#comment-14095261 ] Hadoop QA commented on YARN-1458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616677/YARN-1458.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4610//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4610//console This message is automatically generated. In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely -- Key: YARN-1458 URL: https://issues.apache.org/jira/browse/YARN-1458 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: Centos 2.6.18-238.19.1.el5 X86_64 hadoop2.2.0 Reporter: qingwu.fu Labels: patch Fix For: 2.2.1 Attachments: YARN-1458.patch Original Estimate: 408h Remaining Estimate: 408h The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid: {code} ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x43aa9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) - waiting to lock 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:744) …… FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 runnable [0x433a2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) - locked 0x00070026b6e0 (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) at
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095293#comment-14095293 ] Varun Saxena commented on YARN-2136: [~jianhe], what I was talking about, was the fact that certain events may be queued up in dispatcher event queue while this RM is in Master. But when we close the State Store while switching to standby(on fencing), we will wait till all these events have been processed (queue is drained). Marking the state store as FENCED may avoid these events being processed. That is, let us say I have 5 events in the queue and 3rd event leads to Fencing. Then, the last 2 events will also be processed before switch to standby completes. Your views on need for handling of this case ? RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-160: -- Assignee: Varun Vasudev nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.5.0 As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2409: Assignee: Rohith InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095381#comment-14095381 ] Hudson commented on YARN-2317: -- FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/644/]) YARN-2317. Updated the document about how to write YARN applications. Contributed by Li Lu. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095382#comment-14095382 ] Hudson commented on YARN-1370: -- FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/644/]) YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java Fair scheduler to re-populate container allocation state Key: YARN-1370 URL: https://issues.apache.org/jira/browse/YARN-1370 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Fix For: 2.6.0 Attachments: YARN-1370.001.patch YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095378#comment-14095378 ] Hudson commented on YARN-2373: -- FAILURE: Integrated in Hadoop-Yarn-trunk #644 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/644/]) YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing SSL passwords. Contributed by Larry McCay (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Assignee: Larry McCay Fix For: 2.6.0 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-868: -- Attachment: YARN-868.patch YarnClient should set the service address in tokens returned by getRMDelegationToken() -- Key: YARN-868 URL: https://issues.apache.org/jira/browse/YARN-868 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Varun Saxena Attachments: YARN-868.patch Either the client should set this information into the token or the client layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095466#comment-14095466 ] Rohith commented on YARN-2409: -- I looked into issue (got logs from [~nishan] offline), there is problem in transitioning to standby. While resetting the dispatcher, new dispatcher has been created and assigned to existing dispatcher. But it is missed to stop current dispatcher which causes *1 Thread Leak i.e AsyncDispatcher event handler on each standby transition.* InvalidStateTransitonException are due to processing of queued events from rmDispatcher. This is very corner scenario which would occure because of ActiveStandByElector call back and internal RM transitin to standby(I reproduce it) Consider, RM1 has Active state. If RM transition to standby because of STATE_STORE_FENCED, it first *a) transition RM to standby* and *b) notify elector.* At the same time, before notifying elector if elector ask RM to change Active(bcs of zk unstable and gets rm lock first) then the following flow will occur 1. rm.transitionToStandby(true); --- From RMFatalEventDispatcher.handle() 2. AdminService.transitionToActive(); --- From Elector 3. rm.adminService.resetLeaderElection(); --- From RMFatalEventDispatcher.handle() From above flow problem occure is , say App1--AppAttempt1 is launched and running. AppAttempt1 has put its status update events to rmDispatcher queue. Say STATUS_UPDATE --- STATE_STORE_FENCED. a) Above mentioned 3 flows ocured while processing STATE_STORE_FENCED. It means, RM has transitined to ACTIVE--STANDBY--ACTIVE recovering running application AppAttempt1. b) Since rmDispather thread is not stopped while transitioning to Standby, it process STATUS_UPDATE(rmContext has already app recovered, so it will not be null) event which causes InvalidStateTransitonException. InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095471#comment-14095471 ] Daryn Sharp commented on YARN-1915: --- +1 But since I had a hand in the design, we should get a 2nd vote. ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Attachment: YARN-2409.patch Attached the patch. Please review.. I have verified patch for 1. Thread Leak : Switching manually many times using admin command. 2. Queued up events are not processed once RM moves to StandBy. InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Attachment: (was: YARN-2409.patch) InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Attachment: YARN-2409.patch InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095507#comment-14095507 ] Hudson commented on YARN-2399: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/]) YARN-2399. Delete old versions of files. FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617619) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java YARN-2399. FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617600) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FifoAppComparator.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/NewAppWeightBooster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/WeightAdjuster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java *
[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095509#comment-14095509 ] Hudson commented on YARN-2317: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/]) YARN-2317. Updated the document about how to write YARN applications. Contributed by Li Lu. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095506#comment-14095506 ] Hudson commented on YARN-2373: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/]) YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing SSL passwords. Contributed by Larry McCay (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Assignee: Larry McCay Fix For: 2.6.0 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095510#comment-14095510 ] Hudson commented on YARN-1370: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1836 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1836/]) YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java Fair scheduler to re-populate container allocation state Key: YARN-1370 URL: https://issues.apache.org/jira/browse/YARN-1370 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Fix For: 2.6.0 Attachments: YARN-1370.001.patch YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095543#comment-14095543 ] Hadoop QA commented on YARN-868: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661447/YARN-868.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4611//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4611//console This message is automatically generated. YarnClient should set the service address in tokens returned by getRMDelegationToken() -- Key: YARN-868 URL: https://issues.apache.org/jira/browse/YARN-868 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Varun Saxena Attachments: YARN-868.patch Either the client should set this information into the token or the client layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095577#comment-14095577 ] Hadoop QA commented on YARN-2409: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661455/YARN-2409.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4612//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4612//console This message is automatically generated. InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at
[jira] [Assigned] (YARN-2410) Nodemanager ShuffleHandler can easily exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned YARN-2410: - Assignee: Chen He Nodemanager ShuffleHandler can easily exhaust file descriptors -- Key: YARN-2410 URL: https://issues.apache.org/jira/browse/YARN-2410 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Nathan Roberts Assignee: Chen He Priority: Critical The async nature of the shufflehandler can cause it to open a huge number of file descriptors, when it runs out it crashes. Scenario: Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. Let's say all 6K reduces hit a node at about same time asking for their outputs. Each reducer will ask for all 40 map outputs over a single socket in a single request (not necessarily all 40 at once, but with coalescing it is likely to be a large number). sendMapOutput() will open the file for random reading and then perform an async transfer of the particular portion of this file(). This will theoretically happen 6000*40=24 times which will run the NM out of file descriptors and cause it to crash. The algorithm should be refactored a little to not open the fds until they're actually needed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095597#comment-14095597 ] Hudson commented on YARN-2399: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/]) YARN-2399. Delete old versions of files. FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617619) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java YARN-2399. FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617600) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FifoAppComparator.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/MaxRunningAppsEnforcer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/NewAppWeightBooster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/WeightAdjuster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java *
[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095599#comment-14095599 ] Hudson commented on YARN-2317: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/]) YARN-2317. Updated the document about how to write YARN applications. Contributed by Li Lu. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617594) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, YARN-2317-073014.patch, YARN-2317-081114.patch, YARN-2317-081214.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1370) Fair scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095600#comment-14095600 ] Hudson commented on YARN-1370: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/]) YARN-1370. Fair scheduler to re-populate container allocation state. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617645) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java Fair scheduler to re-populate container allocation state Key: YARN-1370 URL: https://issues.apache.org/jira/browse/YARN-1370 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Fix For: 2.6.0 Attachments: YARN-1370.001.patch YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095596#comment-14095596 ] Hudson commented on YARN-2373: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1862/]) YARN-2373. Changed WebAppUtils to use Configuration#getPassword for accessing SSL passwords. Contributed by Larry McCay (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617555) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/util/TestWebAppUtils.java WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Assignee: Larry McCay Fix For: 2.6.0 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095627#comment-14095627 ] Varun Saxena commented on YARN-868: --- Changes made are as under : 1. Provided an additional interface Token#getServiceAddress() to get Service address from token. 2. Stored server address(RM's) in ApplicationClientProtocolPBClientImpl. This address is set in TokenPBImpl while the GetDelegationResponse is being prepared from ResponseProto. This would let us store the address of RM which served this get delegation token request. 3. As service address field exists in TokenPBImpl but not in TokenProto, made suitable changes in tests. YarnClient should set the service address in tokens returned by getRMDelegationToken() -- Key: YARN-868 URL: https://issues.apache.org/jira/browse/YARN-868 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Varun Saxena Attachments: YARN-868.patch Either the client should set this information into the token or the client layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095628#comment-14095628 ] Varun Saxena commented on YARN-868: --- Another approach to resolve it could have been as under : 1. Make changes in TokenProto as well and return service address from Server(RM). But as TokenProto is defined in hadoop-common and used across projects, didnt make this change. 2. Token is also part of requests, ideally we should have a separate class for Response flow, containing Token and Service Address. But this would break the interface YarnClient#getRMDelegationToken, hence didnt make this change either. YarnClient should set the service address in tokens returned by getRMDelegationToken() -- Key: YARN-868 URL: https://issues.apache.org/jira/browse/YARN-868 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Varun Saxena Attachments: YARN-868.patch Either the client should set this information into the token or the client layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095671#comment-14095671 ] Hitesh Shah commented on YARN-1915: --- [~jlowe] [~daryn] Was there a reason for not sending in the secret to the AM via its env when it launches? I am assuming this is to not have all the AMs to change to handle this? Wouldn't that be a more effective solution as compared to use of a timer ( which in practice would work ) but is still reliant upon the AM receiving the secret from the RM within the time window before the client does. ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095686#comment-14095686 ] Hadoop QA commented on YARN-2409: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661455/YARN-2409.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4613//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4613//console This message is automatically generated. InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095718#comment-14095718 ] Sunil G commented on YARN-2385: --- I am not sure whether we need to check about completed apps for CS and Fair? Only in this case of having completed apps, we have problem of evicting apps beyond a limit as mentioned by [~wangda] in YARN-807. May be two separate apis (getRunningAppsInQueue, getPendingAppsInQueue) with common behavior across CS/Fair could be a better approach. Adding support for listing all applications in a queue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Karthik Kambatla Labels: abstractyarnscheduler This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support application-acls
[ https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095725#comment-14095725 ] Sunil G commented on YARN-2390: --- For getting application report, container report etc, currently in ClientRMService Queue ACL for ADMINISTER_QUEUE is also checked. I think for these reports, same will be applicable in HistoryService also. Investigating whehther generic history service needs to support application-acls Key: YARN-2390 URL: https://issues.apache.org/jira/browse/YARN-2390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen According YARN-1250, it's arguable whether queue-acls should be applied to the generic history service as well, because the queue admin may not need the access to the completed application that is removed from the queue. Create this ticket to tackle the discussion around. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes
[ https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095728#comment-14095728 ] Xuan Gong commented on YARN-2398: - That is strange. For all the Test*OnHA, we are using MiniYarnCluster which is automatically register all the handlers for us when this RM become active. TestResourceTrackerOnHA crashes --- Key: YARN-2398 URL: https://issues.apache.org/jira/browse/YARN-2398 Project: Hadoop YARN Issue Type: Bug Reporter: Jason Lowe TestResourceTrackerOnHA is currently crashing and failing trunk builds. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2390) Investigating whehther generic history service needs to support queue-acls
[ https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2390: -- Summary: Investigating whehther generic history service needs to support queue-acls (was: Investigating whehther generic history service needs to support application-acls) Investigating whehther generic history service needs to support queue-acls -- Key: YARN-2390 URL: https://issues.apache.org/jira/browse/YARN-2390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen According YARN-1250, it's arguable whether queue-acls should be applied to the generic history service as well, because the queue admin may not need the access to the completed application that is removed from the queue. Create this ticket to tackle the discussion around. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095734#comment-14095734 ] Sunil G commented on YARN-2136: --- bq.we will wait till all these events have been processed Will it be little long to wait at this stage to become standby? RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095735#comment-14095735 ] Jason Lowe commented on YARN-1915: -- Good question, Hitesh. I don't know exactly why that was changed, as it was sending it either via the env or via the credentials for the container in 0.23. I assumed there was a reason that wasn't OK given that it was explicitly changed to not do that, but that may be a bad assumption. Digging a bit, found this was changed in YARN-610. Apparently on Windows the env isn't secure and secrets can be gleaned from it. The JIRA also claims the key can't go in the container credentials, but it doesn't elaborate. If a container's credentials also aren't secure then it seems to me we have bigger problems than just this key. ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support queue-acls
[ https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095748#comment-14095748 ] Zhijie Shen commented on YARN-2390: --- bq. For getting application report, container report etc, currently in ClientRMService Queue ACL for ADMINISTER_QUEUE is also checked. That's correct. However, after the app is finished, it has been removed from the queue. The question is whether we still want to give queue admin to the app that used to run on the queue, but now is removed from it and finished. Personally, I prefer not to grant the view access of the finished app to the queue admin, because IMHO, the permissions of the queue admin should be within the scope of his assigned queue. Thoughts? Investigating whehther generic history service needs to support queue-acls -- Key: YARN-2390 URL: https://issues.apache.org/jira/browse/YARN-2390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen According YARN-1250, it's arguable whether queue-acls should be applied to the generic history service as well, because the queue admin may not need the access to the completed application that is removed from the queue. Create this ticket to tackle the discussion around. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095808#comment-14095808 ] Hadoop QA commented on YARN-2356: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657814/Yarn-2356.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4614//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4614//console This message is automatically generated. yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095820#comment-14095820 ] Daryn Sharp commented on YARN-1915: --- I suspect it's because it removes the burden from the AM to strip the secret from the credentials so it doesn't leak to other processes. ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues
[ https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095825#comment-14095825 ] Arun C Murthy commented on YARN-2411: - Looks good to me. Hopefully this should be a simple enhancement in CS. [Capacity Scheduler] support simple user and group mappings to queues - Key: YARN-2411 URL: https://issues.apache.org/jira/browse/YARN-2411 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Ram Venkatesh YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings.enable A boolean that controls if queue mappings are enabled, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format: map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can easily exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095824#comment-14095824 ] jay vyas commented on YARN-2410: Adding this as a part of BIGTOP-1403 also. hopefully we can have some smoke tests which induce this bug. If you have a code snippet or patch to contribute that generates this , feel free to attach it there, we can help you to refine it. Nodemanager ShuffleHandler can easily exhaust file descriptors -- Key: YARN-2410 URL: https://issues.apache.org/jira/browse/YARN-2410 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Nathan Roberts Assignee: Chen He Priority: Critical The async nature of the shufflehandler can cause it to open a huge number of file descriptors, when it runs out it crashes. Scenario: Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. Let's say all 6K reduces hit a node at about same time asking for their outputs. Each reducer will ask for all 40 map outputs over a single socket in a single request (not necessarily all 40 at once, but with coalescing it is likely to be a large number). sendMapOutput() will open the file for random reading and then perform an async transfer of the particular portion of this file(). This will theoretically happen 6000*40=24 times which will run the NM out of file descriptors and cause it to crash. The algorithm should be refactored a little to not open the fds until they're actually needed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues
[ https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2411: Assignee: Ram Venkatesh [Capacity Scheduler] support simple user and group mappings to queues - Key: YARN-2411 URL: https://issues.apache.org/jira/browse/YARN-2411 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Ram Venkatesh Assignee: Ram Venkatesh YARN-2257 has a proposal to extend and share the queue placement rules for the fair scheduler and the capacity scheduler. This is a good long term solution to streamline queue placement of both schedulers but it has core infra work that has to happen first and might require changes to current features in all schedulers along with corresponding configuration changes, if any. I would like to propose a change with a smaller scope in the capacity scheduler that addresses the core use cases for implicitly mapping jobs that have the default queue or no queue specified to specific queues based on the submitting user and user groups. It will be useful in a number of real-world scenarios and can be migrated over to the unified scheme when YARN-2257 becomes available. The proposal is to add two new configuration options: yarn.scheduler.capacity.queue-mappings.enable A boolean that controls if queue mappings are enabled, default is false. and, yarn.scheduler.capacity.queue-mappings A string that specifies a list of mappings in the following format: map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]* map_specifier := user (u) | group (g) source_attribute := user | group | %user queue_name := the name of the mapped queue | %user | %primary_group The mappings will be evaluated left to right, and the first valid mapping will be used. If the mapped queue does not exist, or the current user does not have permissions to submit jobs to the mapped queue, the submission will fail. Example usages: 1. user1 is mapped to queue1, group1 is mapped to queue2 u:user1:queue1,g:group1:queue2 2. To map users to queues with the same name as the user: u:%user:%user I am happy to volunteer to take this up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095885#comment-14095885 ] Jian He commented on YARN-2378: --- Noticed that SchedulerApplication#setQueue should be updated as well when move happens. (can you add test for this too) Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2102: -- Attachment: GeneralizedTimelineACLs.pdf I uploaded a proposal of supporting more generalized ACLs for timeline data model. More generalized timeline ACLs -- Key: YARN-2102 URL: https://issues.apache.org/jira/browse/YARN-2102 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: GeneralizedTimelineACLs.pdf We need to differentiate the access controls of reading and writing operations, and we need to think about cross-entity access control. For example, if we are executing a workflow of MR jobs, which writing the timeline data of this workflow, we don't want other user to pollute the timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned YARN-2056: Assignee: Eric Payne Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2277: -- Attachment: YARN-2277-v8.patch [~jeagles], thanks for the latest patch. +1, LGTM. I uploaded a new patch with some minor touch: 1. Changing http.cross-origin to http-cross-origin; 2. Breaking the lines that go beyond 80 chars. Will commit the patch once jenkins +1. Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch, YARN-2277-v8.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096036#comment-14096036 ] Hadoop QA commented on YARN-2277: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661510/YARN-2277-v8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4615//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4615//console This message is automatically generated. Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch, YARN-2277-v8.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2383: Attachment: YARN-2383.preview.1.patch Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096060#comment-14096060 ] Xuan Gong commented on YARN-2383: - Make the ClientToAMToken renewable after a configurable period of time. AM will send the renewDate in allocate call. RM can send back the new renewDate if needed thru the allocate. At AM side, create a ClientToAMTokenCache ,which is singleton. The ClientToAMTokenCache is used to save the master-key and renewDate of the ClientToAMToken, and the ClientToAMTokenCache will be used in ClientToAMTokenSecretManager in AM side to help save and retrieve the master-key. With using ClientToAMTokenCache, we do not need to change any public API and codes in both AM and client. Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096117#comment-14096117 ] Hudson commented on YARN-2277: -- FAILURE: Integrated in Hadoop-trunk-Commit #6060 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6060/]) YARN-2277. Added cross-origin support for the timeline server web services. Contributed by Jonathan Eagles. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617832) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.6.0 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch, YARN-2277-v8.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096127#comment-14096127 ] Jonathan Eagles commented on YARN-2277: --- Thanks for the great reviews, [~zjshen]! Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.6.0 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, YARN-2277-v7.patch, YARN-2277-v8.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced
[ https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096160#comment-14096160 ] Varun Saxena commented on YARN-2136: [~sunilg], on closer look at the code, when we close the RMStateStore, we close the ZKClients as well. Hence dispatcher queue draining shouldn't matter as ZKClient is already closed. Ofcourse as more than one thread is involved, AsyncDispatcher's event handling thread may pick up an event and process it(store/update operation) before RM can close RMStateStore(while switching to standby). But in my view, this shouldn't be too big an impact I think to warrant adding a FENCED state. RMStateStore can explicitly handle store/update events when fenced -- Key: YARN-2136 URL: https://issues.apache.org/jira/browse/YARN-2136 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He RMStateStore can choose to handle/ignore store/update events upfront instead of invoking more ZK operations if state store is at fenced state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096176#comment-14096176 ] Jason Lowe commented on YARN-2331: -- Ideally the process is self-contained on the NM node so once it has shutdown without killing containers it can be immediately restarted on the new release to minimize the period where the NM is not responding. I suppose we could have the the shutdown/upgrade script on the NM issue the rmadmin command then wait for the NM to receive the RM command and exit. I think it would be cleaner if we didn't have to involve the RM. However I don't feel so strongly that I'd object if we can't find a nice way to do this with just the NM node. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server
[ https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096212#comment-14096212 ] Hudson commented on YARN-2070: -- FAILURE: Integrated in Hadoop-trunk-Commit #6061 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6061/]) YARN-2070. Made DistributedShell publish the short user name to the timeline server. Contributed by Robert Kanter. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java DistributedShell publishes unfriendly user information to the timeline server - Key: YARN-2070 URL: https://issues.apache.org/jira/browse/YARN-2070 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2070.patch Bellow is the code of using the string of current user object as the user value. {code} entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser() .toString()); {code} When we use kerberos authentication, it's going to output the full name, such as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for searching by the primary filters. It's better to use shortUserName instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Venkatraman Krishnan updated YARN-2378: --- Attachment: YARN-2378.patch Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096230#comment-14096230 ] Subramaniam Venkatraman Krishnan commented on YARN-2378: [~jianhe], thanks for your detailed comments. I am attaching a patch that addresses it. Summary of changes: * getCheckLeafQueue renamed to getAndCheckLeafQueue * Add test case to move app to a sibling queue within the same parent * Removed unnecessary unreserve as we need to only the metrics which is already being done in SchedulerApplicationAttempt as you right pointed out. * Removed unwanted synchronized block * Updated SchedulerApplication#setQueue (nice catch). Added checks in test case. Also updated AppSchedulingInfo as it also maintains the queue name. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096239#comment-14096239 ] Hitesh Shah commented on YARN-1915: --- [~daryn] I believe all AMs already have to remove the AMRMToken from their credentials before invoking other processes. [~jlowe] I am not too familiar with this but would it be possible to consider an option where the AMRMToken is used to encrypt/decrypt the ClienttoAMMasterKey? The encrypted key could be then base64 encoded and then sent to the AM via the env? ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time
Jian Fang created YARN-2416: --- Summary: InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time Key: YARN-2416 URL: https://issues.apache.org/jira/browse/YARN-2416 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jian Fang AMLauncher calls startContainers(allRequests) to launch a container for application master. Normally, the call comes back immediately so that the RMAppAttempt changes its state from ALLOCATED to LAUNCHED. However, we do observed that in some cases, the RPC call came back very late but the AM container was already started. Because the RMAppAttempt stuck in ALLOCATED state, once resource manager received the REGISTERED event from the application master, it threw InvalidStateTransitonException as follows. 2014-07-05 08:59:05,021 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: REGISTERED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, resource manager kept throwing InvalidStateTransitonException. 2014-07-05 08:59:06,152 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) 2014-07-05 08:59:07,779 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_02_02 Container Transitioned from NEW to ALLOCATED 2014-07-05 08:59:07,779 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at
[jira] [Commented] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time
[ https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096250#comment-14096250 ] Jian Fang commented on YARN-2416: - Here is the log for the events and state transitions. 2014-07-05 08:57:41,748 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_01_01 Container Transitioned from NEW to ALLOCATED 2014-07-05 08:57:41,760 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_01_01 Container Transitioned from ALLOCATED to ACQUIRED 2014-07-05 08:57:41,833 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container_1404549222428_0001_01_01 : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=LOG_DIR -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx2048m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1LOG_DIR/stdout 2LOG_DIR/stderr 2014-07-05 08:57:42,737 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_01_01 Container Transitioned from ACQUIRED to RUNNING 2014-07-05 08:58:54,290 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_01_01 Container Transitioned from RUNNING to KILLED 2014-07-05 08:58:54,290 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1404549222428_0001_01_01 in state: KILLED event:KILL 2014-07-05 08:58:54,394 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1404549222428_0001_02 State change from SCHEDULED to ALLOCATED_SAVING 2014-07-05 08:58:54,394 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1404549222428_0001_02 State change from ALLOCATED_SAVING to ALLOCATED 2014-07-05 08:58:54,395 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1404549222428_0001_02 2014-07-05 08:58:54,397 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container_1404549222428_0001_02_01 : $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=LOG_DIR -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx2048m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1LOG_DIR/stdout 2LOG_DIR/stderr 2014-07-05 08:58:55,396 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_02_01 Container Transitioned from ACQUIRED to RUNNING 2014-07-05 08:59:05,020 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1404549222428_0001_02 2014-07-05 08:59:05,021 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop IP=10.198.10.201OPERATION=Register App Master TARGET=ApplicationMasterService RESULT=SUCCESS APPID=application_1404549222428_0001 APPATTEMPTID=appattempt_1404549222428_0001_02 2014-07-05 08:59:12,653 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1404549222428_0001_01_01, NodeId: ip-10-198-22-164.us-west-1.compute.internal:9103, NodeHttpAddress: ip-10-198-22-164.us-west-1.compute.internal:9035, Resource: memory:3378, vCores:1, Priority: 0, Token: Token { kind: ContainerToken, service: 10.198.22.164:9103 }, ] for AM appattempt_1404549222428_0001_01 2014-07-05 08:59:12,653 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1404549222428_0001_02_01, NodeId: ip-10-198-10-201.us-west-1.compute.internal:9103, NodeHttpAddress: ip-10-198-10-201.us-west-1.compute.internal:9035, Resource: memory:3378, vCores:1, Priority: 0, Token: Token { kind: ContainerToken, service: 10.198.10.201:9103 }, ] for AM appattempt_1404549222428_0001_02 InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time Key: YARN-2416 URL: https://issues.apache.org/jira/browse/YARN-2416 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jian Fang AMLauncher calls startContainers(allRequests) to launch a container for application master. Normally, the call comes back immediately so that the RMAppAttempt
[jira] [Commented] (YARN-2416) InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time
[ https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096256#comment-14096256 ] Jian Fang commented on YARN-2416: - Seems for some reason, the RPC call for startContainers(allRequests) in AMLauncher was blocked. More interesting thing is that the second retry to launch the application master failed for the same reason and the responses of the two startContainers() calls for both AM launches came back at the same time as shown in the above log. Since the REGISTERED event is a good indication that the AM container was launched successfully, can we add state transition from ALLOCATED to RUNNING or do two state transitions, i.e., from ALLOCATED to LAUNCHED and then from LAUNCHED to RUNNING, in this case? InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time Key: YARN-2416 URL: https://issues.apache.org/jira/browse/YARN-2416 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jian Fang AMLauncher calls startContainers(allRequests) to launch a container for application master. Normally, the call comes back immediately so that the RMAppAttempt changes its state from ALLOCATED to LAUNCHED. However, we do observed that in some cases, the RPC call came back very late but the AM container was already started. Because the RMAppAttempt stuck in ALLOCATED state, once resource manager received the REGISTERED event from the application master, it threw InvalidStateTransitonException as follows. 2014-07-05 08:59:05,021 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: REGISTERED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, resource manager kept throwing InvalidStateTransitonException. 2014-07-05 08:59:06,152 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) 2014-07-05 08:59:07,779 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_02_02 Container Transitioned from NEW to ALLOCATED 2014-07-05 08:59:07,779 ERROR
[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2383: Attachment: YARN-2383.preview.2.patch fix testcase failures Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2417) Sorting by total memory doesn't work correctly
Siqi Li created YARN-2417: - Summary: Sorting by total memory doesn't work correctly Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2417) Sorting by total memory doesn't work correctly
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li reassigned YARN-2417: - Assignee: Siqi Li Sorting by total memory doesn't work correctly Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2417) Sorting by total memory doesn't work correctly
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2417: -- Affects Version/s: 2.4.0 Sorting by total memory doesn't work correctly Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: _thumb_151509.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096307#comment-14096307 ] Hadoop QA commented on YARN-415: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661548/YARN-415.201408132109.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.cli.TestYarnCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4617//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4617//console This message is automatically generated. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2417) Sorting by total memory doesn't work correctly
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2417: -- Attachment: YARN-2417.v1.patch Sorting by total memory doesn't work correctly Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2417.v1.patch, _thumb_151509.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2417) Adding total Memory into AppsBlocks
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2417: -- Summary: Adding total Memory into AppsBlocks (was: Sorting by total memory doesn't work correctly) Adding total Memory into AppsBlocks --- Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2417.v1.patch, _thumb_151509.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096319#comment-14096319 ] Daryn Sharp commented on YARN-1915: --- Yes, I thought the ugi mangling was gone, but the AMRMToken is indeed manually removed. I'm assuming there was a valid reason why the secret is passed in the registration response, perhaps for future functionality. Rather than second guess how/why it's done this way, I'd prefer to focus on a small immediate fix for this very tight race condition. The AM should generally receive the registration response before a client can ask the RM where the AM is and try to connect. Could we file another jira to contemplate an incompatible change? ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096329#comment-14096329 ] Wangda Tan commented on YARN-2385: -- I think we might not need maintain completed apps CS and Fair after thought about it. Maintain such fields is not original responsibility designed for scheduler. And for now, user can get completed container via REST API, that should be able to cover most use cases. Adding support for listing all applications in a queue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Karthik Kambatla Labels: abstractyarnscheduler This JIRA proposes adding a method in AbstractYarnScheduler to get all the pending/active applications. Fair scheduler already supports moving a single application from one queue to another. Support for the same is being added to Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition of this method, we can transparently add support for moving all applications from source queue to target queue and draining a queue, i.e. killing all applications in a queue as proposed by YARN-2389 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096351#comment-14096351 ] Hadoop QA commented on YARN-2378: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661564/YARN-2378.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4618//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4618//console This message is automatically generated. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2417) Adding total Memory into AppsBlocks
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096360#comment-14096360 ] Jason Lowe commented on YARN-2417: -- Is this a dup of YARN-451? Adding total Memory into AppsBlocks --- Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2417.v1.patch, _thumb_151509.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2417) Adding total Memory into AppsBlocks
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096374#comment-14096374 ] Siqi Li commented on YARN-2417: --- I think so, but that one didn't get committed yet. Adding total Memory into AppsBlocks --- Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2417.v1.patch, _thumb_151509.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2417) Adding total Memory into AppsBlocks
[ https://issues.apache.org/jira/browse/YARN-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096376#comment-14096376 ] Hadoop QA commented on YARN-2417: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661588/YARN-2417.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4619//console This message is automatically generated. Adding total Memory into AppsBlocks --- Key: YARN-2417 URL: https://issues.apache.org/jira/browse/YARN-2417 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2417.v1.patch, _thumb_151509.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Attachment: YARN-1506-v11.patch Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096407#comment-14096407 ] Junping Du commented on YARN-1506: -- Thanks [~bikassaha] and [~jianhe] for review and comments! bq. At this point the previously accepted 4GB request cannot be satisfied and the application will get stuck. We may need to follow this up in a different jira. Agree. Agree to discuss this in a separate JIRA. Improve unit tests as your previous comments. bq. What is more important is that having a shared method declares that these pieces of code are related to each other in a logical way (if such a relation exists). Done. Abstract to a common method. bq. Since it’s changed to asynchronous, we may change the log to not say successfully. Removed. bq. IMO, since UpdateNodeResourceWhenUnusableTransition and UpdateNodeResourceWhenNonRunningTransition are the same except one extra logging, we can do the logging for both and just keep one transition? Agree. Merged. bq. if possible, nodeResourceUpdate method can be moved into AbstractYarnScheduler, a new common base class for sharing common code among all the schedulers. That's good point. Move to AbstractYarnScheduler now. bq. SchedulerNode.setTotalResource - SchedulerNode.updateTotalAndAvailableResource() ? My current thinking is better to keep it as setTotalResource, as the later one may mistake people that we can set availableResource as a separate value in this method. Actually, we only set totalResource and availableResource is just get refresh to keep consistent with totalResource change. May be we can keep it here? bq. UpdateNodeResourceResponse should be an abstract class which implements newInstance() method. Fixed. bq. AdminService.updateNodeResource should RMAuditLogger to log the operations as well. Fixed. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096465#comment-14096465 ] Hadoop QA commented on YARN-2383: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661575/YARN-2383.preview.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4620//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4620//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4620//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4620//console This message is automatically generated. Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1584) Support explicit failover when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved YARN-1584. Resolution: Won't Fix Target Version/s: (was: ) Marking this as Won't Fix as we didn't see a particular need for it on RM HA deployments. We can re-open this if need be. Support explicit failover when automatic failover is enabled Key: YARN-1584 URL: https://issues.apache.org/jira/browse/YARN-1584 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-1029 adds automatic failover support. However, users can't explicitly ask for a failover from one RM to the other without stopping the other RM. Stopping the RM until the other RM takes over and then restarting the first RM is more involving and exposes the RM-ensemble to SPOF for a longer duration. It would be nice to allow explicit failover through yarn rmadmin -failover command. PS: HDFS supports -failover option. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1602) All failed RMStateStore operations should not be RMFatalEvents
[ https://issues.apache.org/jira/browse/YARN-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved YARN-1602. Resolution: Won't Fix RM HA deployments have been working fairly well. Let us close this as Won't Fix for now and revisit it if need be. All failed RMStateStore operations should not be RMFatalEvents -- Key: YARN-1602 URL: https://issues.apache.org/jira/browse/YARN-1602 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Currently, if a state store operation fails, depending on the exception, either a RMFatalEvent.STATE_STORE_FENCED or RMFatalEvent.STATE_STORE_OP_FAILED events are created. The latter results in the RM failing. Instead, we should probably kill the application corresponding to the store operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2415) Expose MiniYARNCluster for use outside of YARN
[ https://issues.apache.org/jira/browse/YARN-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096476#comment-14096476 ] Karthik Kambatla commented on YARN-2415: Yes. MiniYARNCluster today is marked Public-Evolving. We need to make it Public-Stable. Before we do that, I think we should hide most of its methods and the constructors behind @Private annotations, and add a couple of @Public static methods. Expose MiniYARNCluster for use outside of YARN -- Key: YARN-2415 URL: https://issues.apache.org/jira/browse/YARN-2415 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.5.0 Reporter: Hari Shreedharan Assignee: Karthik Kambatla The MR/HDFS equivalents are available for applications to use in tests, but the YARN Mini cluster is not. It would be really useful to test applications that are written to run on YARN (like Spark) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096494#comment-14096494 ] Hadoop QA commented on YARN-1506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661608/YARN-1506-v11.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.sls.nodemanager.TestNMSimulator org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator org.apache.hadoop.yarn.sls.TestSLSRunner org.apache.hadoop.yarn.server.resourcemanager.TestRMNodeTransitions org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4621//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4621//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4621//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores
[ https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096533#comment-14096533 ] Chen He commented on YARN-2413: --- IMHO, we may need to include vcore into headroom calculation also if we turn on the vcore as type of resource. capacity scheduler will overallocate vcores --- Key: YARN-2413 URL: https://issues.apache.org/jira/browse/YARN-2413 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 3.0.0, 2.2.0 Reporter: Allen Wittenauer Priority: Critical It doesn't appear that the capacity scheduler is properly allocating vcores when making scheduling decisions, which may result in overallocation of CPU resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2383) Add ability to renew ClientToAMToken
[ https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2383: Attachment: YARN-2383.preview.3.patch Add ability to renew ClientToAMToken Key: YARN-2383 URL: https://issues.apache.org/jira/browse/YARN-2383 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, YARN-2383.preview.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096592#comment-14096592 ] Zhijie Shen commented on YARN-2033: --- [~djp], thanks for raising this question explicitly. Here're two points I'd like to highlight for this work: 1. This patch doesn't intend to remove the existing FS based history store, but deprecate it by removing the default configs about loading FS based history store. On the other hand, the patch adds the history store that rides the timeline store, and use it as the default. Given the user who is the early adopter of the generic history service wants to continue with FS based history store, he needs to set the old configs explicitly (actually he should have done it because by default the generic history service is not enabled), and the new generic history service is still going to horner old configs for backward compatibility. 2. Though the generic history service (previously we call it application history server) is introduced to Hadoop since 2.4, but it is not production ready. We have explicitly highlighted it in the [documentation|http://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Current_Status]. I agree it seems to be a bit aggressive to move from FS based history store to timeline store based one as the default, however, I'm afraid it's the best choice at the current stage, because FS based history store has several critical limitations: no caching, no retention, not scalable and not supporting the secure mode. Unless we're able to solve all these problems (obviously we don't have the bandwidth to do it now), it's risky to use FS based history store as the default, in particular when the timeline server is going to be production ready. On the other side, the aforementioned limitations have already been addressed by the timeline store (scalability will be ensured by HBase timeline store). Hence timeline store based history store should be a more reasonable and reliable default of new users. Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096599#comment-14096599 ] Jian He commented on YARN-2378: --- patch looks good overall, just one thing that we can move metrics.submitApp(userName); to addApplication() so that we can avoid changing the method signature to include the isMove flag. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Priority: Critical (was: Major) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak. --- Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Priority: Critical Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Summary: Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak. (was: InvalidStateTransitonException in ResourceManager after job recovery) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak. --- Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Assignee: Rohith Attachments: YARN-2409.patch {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)