Appreciate your input Bryant, i will try to reproduce and see the namenode log before, while, and after it pause. Wish me luck
On Fri, Aug 9, 2013 at 2:09 PM, Bryan Beaudreault <[email protected]>wrote: > When I've had problems with a slow jobtracker, i've found the issue to be > one of the following two (so far) possibilities: > > - long GC pause (I'm guessing this is not it based on your email) > - hdfs is slow > > I haven't dived into the code yet, but circumstantially I've found that > when you submit a job the jobtracker needs to put a bunch of files in hdfs, > such as the job.xml, the job jar, etc. I'm not sure how this scales with > larger and larger jobs, aside form the size of the splits serialization in > the job.xml, but if your HDFS is slow for any reason it can cause pauses in > your jobtracker. This affects other jobs being able to submit, as well as > the 50030 web ui. > > I'd take a look at your namenode logs. When the jobtracker logs pause, do > you see a corresponding pause in the namenode logs? What gets spewed > before and after that pause? > > > On Fri, Aug 9, 2013 at 4:41 PM, Patai Sangbutsarakum < > [email protected]> wrote: > >> A while back, i was fighting with the jobtracker page hangs when i browse >> to http://jobtracker:50030 browser doesn't show jobs info as usual which >> ends up because of allowing too much job history kept in jobtracker. >> >> Currently, i am setting up a new cluster 40g heap on the namenode and >> jobtracker in dedicated machines. Not fun part starts here; a developer >> tried to test out the cluster by launching a 76k map job (the cluster has >> around 6k-ish mappers) >> Job initialization was success, and finished the job. >> >> However, before the job is actually running, i can't access to the >> jobtracker page anymore same symptom as above. >> >> i see bunch of this in jobtracker log >> >> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201307291733_0619_m_076796 has split on node: /rack/node >> .. >> .. >> .. >> >> Until i see this >> >> INFO org.apache.hadoop.mapred.JobInProgress: job_201307291733_0619 >> LOCALITY_WAIT_FACTOR=1.0 >> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress: Job >> job_201307291733_0619 initialized successfully with 76797 map tasks and 10 >> reduce tasks. >> >> that's when i can access to the jobtracker page again. >> >> >> CPU on jobtracker is very little load, JTK's Heap is far from full like >> 1ish gig from 40 >> network bandwidth is far from filled up. >> >> I'm running on 0.20.2 branch on CentOS6.4 with Java(TM) SE Runtime >> Environment (build 1.6.0_32-b05) >> >> >> What would be the root cause i should looking at or at least where to >> start? >> >> Thanks you in advanced >> >> >> >> >
