Thanks On Mon Feb 09 2015 at 01:43:24 Xuan Gong <[email protected]> wrote:
> That is for client connect retry in ipc level. > > You can decrease the max.retries by configuring > > ipc.client.connect.max.retries.on.timeouts > > in core-site.xml > > > Thanks > > Xuan Gong > > From: Telles Nobrega <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Saturday, February 7, 2015 at 8:37 PM > To: "[email protected]" <[email protected]> > Subject: Max Connect retries > > Hi, I changed my cluster config so a failed nodemanager can be detected > in about 30 seconds. When I'm running a wordcount the reduce gets stuck in > 25% for a quite while and logs show nodes trying to connect to the failed > node: > > org.apache.hadoop.ipc.Client: Retrying connect to server: > hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already > tried 28 time(s); maxRetries=45 > 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request > from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 10000 > > Is this the expected behaviour? should I change max retries to a lower > values? if so, which config is that? > > Thanks > > >
