Thanks guys. Unfortunately I had started the datanode by local command rather than from start-all.sh, so the related parts of the logs were lost. I was watching the cpu loads on all 8 cores via gkrellm at the time and they were definitely quiet. After a few minutes the jobs seemed to get in sync and it ran under a reasonable load (i.e. all cores mostly busy, with only brief gaps between tasks) for the rest of the job.
I will attempt to re-create tomorrow with proper logging. I will look into enabling Hadoop metrics. -Terry On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote: > Do you know if you have enough job-load on the system? One way to look at > this is to look for running map/reduce tasks on the JT UI at the same time > you are looking at the node's cpu usage. > > Collecting hadoop metrics via a metrics collection system say ganglia will > let you match up the timestamps of idleness on the nodes with the job-load at > that point of time. > > HTH, > +vinod > > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote: > >> Running 1.0.2, in this case on Linux. >> >> I was watching the processes / loads on one TaskTracker instance and >> noticed that it completed it's first 8 map tasks and reported 8 free >> slots (the max for this system). It then waited doing nothing for more >> than 30 seconds before the next "batch" of work came in and started running. >> >> Likewise it also has relatively long periods with all 8 cores running at >> or near idle. There are no jobs failing or obvious errors in the >> TaskTracker log. >> >> What could be causing this? >> >> Should I increase the number of map jobs to greater than number of cores >> to try and keep it busier? >> >> -Terry -- Terry Healy / [email protected] Cyber Security Operations Brookhaven National Laboratory Building 515, Upton N.Y. 11973
