RE: Time out after 600 for YARN mapreduce application

Alexandru Pacurar Wed, 11 Feb 2015 23:42:32 -0800

Hello,

Regarding the AttemptID:attempt_1423062241884_9970_m_000009_0 Timed out after 
600 secs error, I managed to get en extended status for it. The other message 
that I get is java.lang.Exception: Container is not yet running. Current state 
is LOCALIZING. So the container spends 10 minutes in the LOCALIZING state  than 
it fails.

Thank you,
Alex

From: Alexandru Pacurar
Sent: Wednesday, February 11, 2015 1:35 PM
To: [email protected]
Subject: RE: Time out after 600 for YARN mapreduce application

Thank you for the quick reply.

I will modify the value to check if this is the threshold I'm hitting, but I 
was thinking of decreasing it because my jobs take to long If they get this 
time out. I would rather fail fast, than keep the cluster busy with jobs stuck 
in timeouts. Ideally I would like to troubleshoot the issue and not fail at all 
:) .

My MR job is not a custom one it is a job from Nutch 1.8 . Actually there are 
several jobs from Nutch that fail (ex: Generator, Indexer ).

Also because this is related to Nutch 1.8 also should I move the question to 
the Nutch mailing list?

Thanks,
Alex

From: Rohith Sharma K S [mailto:[email protected]]
Sent: Wednesday, February 11, 2015 12:32 PM
To: [email protected]<mailto:[email protected]>
Subject: RE: Time out after 600 for YARN mapreduce application

Looking into attemptID, this is mapper task getting timed out in MapReduce job. 
 The configuration that can be used to increase the value is 
'mapreduce.task.timeout'.

The task timed out is because if there is no heartbeat from 
MapperTask(YarnChild) to MRAppMaster for 10 mins.  Does MR job is custom job?  
If so any operation are you doing in cleanup() of Mapper ? Sometimes there 
would be possible that if cleanup() of Mapper is taking more time greater than 
timedout configured that result in task to timeout.

Thanks & Regards
Rohith Sharma K S
From: Alexandru Pacurar [mailto:[email protected]]
Sent: 11 February 2015 15:34
To: [email protected]<mailto:[email protected]>
Subject: Time out after 600 for YARN mapreduce application

Hello,

I keep encountering an error when running nutch on hadoop YARN:

AttemptID:attempt_1423062241884_9970_m_000009_0 Timed out after 600 secs

Some info on my setup. I'm running a 64 nodes cluster with hadoop 2.4.1. Each 
node has 4 cores, 1 disk and 24Gb of RAM, and the namenode/resourcemanager has 
the same specs only with 8 cores.

I am pretty sure one of these parameters is to the threshold I'm hitting:

yarn.am.liveness-monitor.expiry-interval-ms
yarn.nm.liveness-monitor.expiry-interval-ms
yarn.resourcemanager.nm.liveness-monitor.interval-ms

but I would like to understand why.

The issue usually appears under heavier load, and most of the time the on the 
next attempts it is successful. Also if I restart the Hadoop cluster the error 
goes away for some time.

Thanks,
Alex

RE: Time out after 600 for YARN mapreduce application

Reply via email to