Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING

Arinto Murdopo Sat, 03 Nov 2012 23:17:01 -0700

Hi Arun,

Thanks for the prompt reply. We need to test it for our school project
which scheduled to end in early December. So, we still need  to continue.


The YARN-128 discussion (https://issues.apache.org/jira/browse/YARN-128)
mentions that Devaraj is successfully test the RM resurrection. So in this
case, how do test is? Do you kill and resurrect RM at random time?

We are doing the resurrection using these following steps:

1. Run example MR jobs (such as the Pi computation)
2. After the mapping and reducing process started, we kill the RM using
linux's kill command
3. Then, we wait for 3 seconds before we resurrect it.
4. We noticed that the mapping process is able to continue, and the job
stuck when the mapping process reaches 100%. At that time reduce process is
still 0%.

We also modified TestMRJobs.java to use ZKStore, and use
ResourceManagerWrapper to start and stop the ResourceManager

regards,

Arinto Murdopo
European Master in Distributed Computing (EMDC)
Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
KTH Royal Institute of Technology, Stockholm, Sweden
Phone: +46 725 548 759



On Sat, Nov 3, 2012 at 7:04 PM, Arun C Murthy <[email protected]> wrote:

> Arinto,
>
>  Unfortunately, it's too early to try it yet, I'd wait for a little longer
> to for it to stabilize - should be soon.
>
>  Thanks for trying it and the feedback though! Much appreciated.
>
> Arun
>
> On Nov 3, 2012, at 6:55 AM, Arinto Murdopo wrote:
>
> > Hi all,
> >
> > We have this exception when we tried to resurrect ResourceManager using
> > ZKStore. We are using Hadoop version 2.0.2 Alpha RC2, with patch from
> > #YARN-128 issue (https://issues.apache.org/jira/browse/YARN-128).
> >
> > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid
> event:
> > CONTAINER_FINISHED at RECOVERING
> > at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> > at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> > at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:510)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:83)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:442)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:423)
> > at
> >
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> > at
> >
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> > at java.lang.Thread.run(Thread.java:662)
> >
> > Inspecting RMAppAttemptImpl, we noticed that the state transition doesn't
> > handle CONTAINER_FINISHED event when it is in the RECOVERING state. So in
> > this case, what is the correct transition to handle CONTAINER_FINISHED
> > event when we are in RECOVERING state?
> >
> > regards,
> >
> > Arinto Murdopo
> > European Master in Distributed Computing (EMDC)
> > Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
> > KTH Royal Institute of Technology, Stockholm, Sweden
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING

Reply via email to