Hey Susheel, Thanks for the reply. unfortunately those setting didn't help.
Anyhow i found the following related bug: https://issues.apache.org/jira/browse/YARN-3238 This is fixed in 2.7.0. Thanks, -Manoj On Wed, Aug 19, 2015 at 11:04 PM, Susheel Kumar Gadalay <[email protected] > wrote: > Change mapreduce.reduce.shuffle.connect.timeout, > mapreduce.reduce.shuffle.read.timeout. > By default they are 180000. > > On 8/20/15, manoj <[email protected]> wrote: > > Hello all, > > > > I'm running Apache2.6.0. > > I'm trying to remove a node from a Hadoop Cluster and the add it back. > > The taskattempts on the node which was removed are rescheduled only after > > 30min. > > > > During this 30min period looks like the App Master is trying to connect( > > check the log below ) the same node which was removed and after about > 30min > > it reschedules those taskAttempts from the lost node and eventually the > job > > succeeds. > > > > how can I reduce the 30min wait time? > > > > ..... > > ...... > > 2015-08-14 11:25:21,662 INFO [ContainerLauncher #7] > > org.apache.hadoop.ipc.Client: Retrying connect to server: > > host172/XX.XX.XX.XX:36158. Already tried 0 time(s); retry policy is > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > > MILLISECONDS) > > ...... > > ...... > > > > Thanks > > --Manoj Kumar M > > > -- --Manoj Kumar M
