Re: Delays in worker node jobs

Terry Healy Wed, 29 Aug 2012 18:20:58 -0700

Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.


I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at 
> this is to look for running map/reduce tasks on the JT UI at the same time 
> you are looking at the node's cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will 
> let you match up the timestamps of idleness on the nodes with the job-load at 
> that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / [email protected]
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: Delays in worker node jobs

Reply via email to