We're using default scheduler, which is CapacityScheduler according to YarnConfiguration.java
Arinto Murdopo European Master in Distributed Computing (EMDC) Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain KTH Royal Institute of Technology, Stockholm, Sweden Phone: +46 725 548 759 On Mon, Nov 5, 2012 at 1:58 AM, Devaraj K <[email protected]> wrote: > Hi Arinto, > > > Could you please confirm, what is the scheduler configured here? > > Thanks & Regards > Devaraj K > > -----Original Message----- > From: Arinto Murdopo [mailto:[email protected]] > Sent: Sunday, November 04, 2012 11:46 AM > To: [email protected] > Subject: Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING > > Hi Arun, > > Thanks for the prompt reply. We need to test it for our school project > which scheduled to end in early December. So, we still need to continue. > > The YARN-128 discussion (https://issues.apache.org/jira/browse/YARN-128) > mentions that Devaraj is successfully test the RM resurrection. So in this > case, how do test is? Do you kill and resurrect RM at random time? > > We are doing the resurrection using these following steps: > > 1. Run example MR jobs (such as the Pi computation) > 2. After the mapping and reducing process started, we kill the RM using > linux's kill command > 3. Then, we wait for 3 seconds before we resurrect it. > 4. We noticed that the mapping process is able to continue, and the job > stuck when the mapping process reaches 100%. At that time reduce process is > still 0%. > > We also modified TestMRJobs.java to use ZKStore, and use > ResourceManagerWrapper to start and stop the ResourceManager > > regards, > > Arinto Murdopo > European Master in Distributed Computing (EMDC) > Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain > KTH Royal Institute of Technology, Stockholm, Sweden > Phone: +46 725 548 759 > > > > On Sat, Nov 3, 2012 at 7:04 PM, Arun C Murthy <[email protected]> wrote: > > > Arinto, > > > > Unfortunately, it's too early to try it yet, I'd wait for a little > longer > > to for it to stabilize - should be soon. > > > > Thanks for trying it and the feedback though! Much appreciated. > > > > Arun > > > > On Nov 3, 2012, at 6:55 AM, Arinto Murdopo wrote: > > > > > Hi all, > > > > > > We have this exception when we tried to resurrect ResourceManager using > > > ZKStore. We are using Hadoop version 2.0.2 Alpha RC2, with patch from > > > #YARN-128 issue (https://issues.apache.org/jira/browse/YARN-128). > > > > > > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid > > event: > > > CONTAINER_FINISHED at RECOVERING > > > at > > > > > > > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFa > ctory.java:301) > > > at > > > > > > > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFact > ory.java:43) > > > at > > > > > > > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTran > sition(StateMachineFactory.java:443) > > > at > > > > > > > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl > .handle(RMAppAttemptImpl.java:510) > > > at > > > > > > > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl > .handle(RMAppAttemptImpl.java:83) > > > at > > > > > > > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAtt > emptEventDispatcher.handle(ResourceManager.java:442) > > > at > > > > > > > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAtt > emptEventDispatcher.handle(ResourceManager.java:423) > > > at > > > > > > > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:1 > 26) > > > at > > > > > > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > Inspecting RMAppAttemptImpl, we noticed that the state transition > doesn't > > > handle CONTAINER_FINISHED event when it is in the RECOVERING state. So > in > > > this case, what is the correct transition to handle CONTAINER_FINISHED > > > event when we are in RECOVERING state? > > > > > > regards, > > > > > > Arinto Murdopo > > > European Master in Distributed Computing (EMDC) > > > Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain > > > KTH Royal Institute of Technology, Stockholm, Sweden > > > > -- > > Arun C. Murthy > > Hortonworks Inc. > > http://hortonworks.com/ > > > > > > > >
