Re: App Master takes ~30min to re-schedule task attempts.

manoj Fri, 21 Aug 2015 10:24:44 -0700

Hey Susheel,

Thanks for the reply. unfortunately those setting didn't help.


Anyhow i found the following related bug:
https://issues.apache.org/jira/browse/YARN-3238

This is fixed in 2.7.0.

Thanks,
-Manoj



On Wed, Aug 19, 2015 at 11:04 PM, Susheel Kumar Gadalay <[email protected]
> wrote:

> Change mapreduce.reduce.shuffle.connect.timeout,
> mapreduce.reduce.shuffle.read.timeout.
> By default they are 180000.
>
> On 8/20/15, manoj <[email protected]> wrote:
> > Hello all,
> >
> > I'm running Apache2.6.0.
> > I'm trying to remove a node from a Hadoop Cluster and the add it back.
> > The taskattempts on the node which was removed are rescheduled only after
> > 30min.
> >
> > During this 30min period looks like the App Master is trying to connect(
> > check the log below ) the same node which was removed and after about
> 30min
> > it reschedules those taskAttempts from the lost node and eventually the
> job
> > succeeds.
> >
> > how can I reduce the 30min wait time?
> >
> > .....
> > ......
> > 2015-08-14 11:25:21,662 INFO [ContainerLauncher #7]
> > org.apache.hadoop.ipc.Client: Retrying connect to server:
> > host172/XX.XX.XX.XX:36158. Already tried 0 time(s); retry policy is
> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > MILLISECONDS)
> > ......
> > ......
> >
> > Thanks
> > --Manoj Kumar M
> >
>



-- 
--Manoj Kumar M

Re: App Master takes ~30min to re-schedule task attempts.

Reply via email to