Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING

Arinto Murdopo Thu, 08 Nov 2012 10:39:03 -0800

We're using default scheduler, which is CapacityScheduler according to
YarnConfiguration.java


Arinto Murdopo
European Master in Distributed Computing (EMDC)
Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
KTH Royal Institute of Technology, Stockholm, Sweden
Phone: +46 725 548 759



On Mon, Nov 5, 2012 at 1:58 AM, Devaraj K <[email protected]> wrote:

> Hi Arinto,
>
>
>      Could you please confirm, what is the scheduler configured here?
>
> Thanks & Regards
>     Devaraj K
>
> -----Original Message-----
> From: Arinto Murdopo [mailto:[email protected]]
> Sent: Sunday, November 04, 2012 11:46 AM
> To: [email protected]
> Subject: Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING
>
> Hi Arun,
>
> Thanks for the prompt reply. We need to test it for our school project
> which scheduled to end in early December. So, we still need  to continue.
>
> The YARN-128 discussion (https://issues.apache.org/jira/browse/YARN-128)
> mentions that Devaraj is successfully test the RM resurrection. So in this
> case, how do test is? Do you kill and resurrect RM at random time?
>
> We are doing the resurrection using these following steps:
>
> 1. Run example MR jobs (such as the Pi computation)
> 2. After the mapping and reducing process started, we kill the RM using
> linux's kill command
> 3. Then, we wait for 3 seconds before we resurrect it.
> 4. We noticed that the mapping process is able to continue, and the job
> stuck when the mapping process reaches 100%. At that time reduce process is
> still 0%.
>
> We also modified TestMRJobs.java to use ZKStore, and use
> ResourceManagerWrapper to start and stop the ResourceManager
>
> regards,
>
> Arinto Murdopo
> European Master in Distributed Computing (EMDC)
> Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
> KTH Royal Institute of Technology, Stockholm, Sweden
> Phone: +46 725 548 759
>
>
>
> On Sat, Nov 3, 2012 at 7:04 PM, Arun C Murthy <[email protected]> wrote:
>
> > Arinto,
> >
> >  Unfortunately, it's too early to try it yet, I'd wait for a little
> longer
> > to for it to stabilize - should be soon.
> >
> >  Thanks for trying it and the feedback though! Much appreciated.
> >
> > Arun
> >
> > On Nov 3, 2012, at 6:55 AM, Arinto Murdopo wrote:
> >
> > > Hi all,
> > >
> > > We have this exception when we tried to resurrect ResourceManager using
> > > ZKStore. We are using Hadoop version 2.0.2 Alpha RC2, with patch from
> > > #YARN-128 issue (https://issues.apache.org/jira/browse/YARN-128).
> > >
> > > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid
> > event:
> > > CONTAINER_FINISHED at RECOVERING
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFa
> ctory.java:301)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFact
> ory.java:43)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTran
> sition(StateMachineFactory.java:443)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
> .handle(RMAppAttemptImpl.java:510)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
> .handle(RMAppAttemptImpl.java:83)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAtt
> emptEventDispatcher.handle(ResourceManager.java:442)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAtt
> emptEventDispatcher.handle(ResourceManager.java:423)
> > > at
> > >
> >
>
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:1
> 26)
> > > at
> > >
> >
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> > > at java.lang.Thread.run(Thread.java:662)
> > >
> > > Inspecting RMAppAttemptImpl, we noticed that the state transition
> doesn't
> > > handle CONTAINER_FINISHED event when it is in the RECOVERING state. So
> in
> > > this case, what is the correct transition to handle CONTAINER_FINISHED
> > > event when we are in RECOVERING state?
> > >
> > > regards,
> > >
> > > Arinto Murdopo
> > > European Master in Distributed Computing (EMDC)
> > > Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
> > > KTH Royal Institute of Technology, Stockholm, Sweden
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
> >
>
>

Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING

Reply via email to