Re: Max Connect retries

Telles Nobrega Mon, 09 Feb 2015 10:27:21 -0800

It did finish, but it took hours, and in one case it didnt finish at all.
The same thing happened running the pi estimator


On Mon Feb 09 2015 at 15:24:11 daemeon reiydelle <[email protected]> wrote:

> Are your nodes actually stuck or are you in e.g. a reduce step that is
> reading so much data across the network that the node SEEMS unreachable?
>
>
> Since you mention "gets stuck for a while at 25%", that suggests that
> eventually the node finishes up its work ...
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Mon, Feb 9, 2015 at 2:49 AM, Telles Nobrega <[email protected]>
> wrote:
>
>> Thanks
>>
>> On Mon Feb 09 2015 at 01:43:24 Xuan Gong <[email protected]> wrote:
>>
>>>  That is for client connect retry in ipc level.
>>>
>>> You can decrease the max.retries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>>  Thanks
>>>
>>>  Xuan Gong
>>>
>>>   From: Telles Nobrega <[email protected]>
>>> Reply-To: "[email protected]" <[email protected]>
>>> Date: Saturday, February 7, 2015 at 8:37 PM
>>> To: "[email protected]" <[email protected]>
>>> Subject: Max Connect retries
>>>
>>>   Hi, I changed my cluster config so a failed nodemanager can be
>>> detected in about 30 seconds. When I'm running a wordcount the reduce gets
>>> stuck in 25% for a quite while and logs show nodes trying to connect to the
>>> failed node:
>>>
>>>  org.apache.hadoop.ipc.Client: Retrying connect to server: 
>>> hadoop-telles-844fb3f0-dfd8-456d-89c3-1d7cfdbdcad2/10.3.2.99:49911. Already 
>>> tried 28 time(s); maxRetries=45
>>> 2015-02-08 04:26:42,088 INFO [IPC Server handler 16 on 50037] 
>>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents 
>>> request from attempt_1423319128424_0025_r_000000_0. startIndex 24 maxEvents 
>>> 10000
>>>
>>> Is this the expected behaviour? should I change max retries to a lower 
>>> values? if so, which  config is that?
>>>
>>> Thanks
>>>
>>>
>>>
>

Re: Max Connect retries

Reply via email to