Hi Markus, Have you checked the JobTracker at the time of launching the job that Map slots were available?
Looks like the input dataset size is ~464 GB. Since you mentioned 10 GB jobs are running fine, there should be no reason a larger dataset should be stuck, atleast not on Pig side. I can't think of a good reason why the job does not take off other than the fact that cluster was busy running some other job. I see that the number of files being processed is large, 50353. That could be a reason for slowness, but ~8 minutes as shown in the logs seems to be on the higher end for that. May be also post your script here. On Thu, May 31, 2012 at 2:38 AM, Markus Resch <[email protected]>wrote: > Hi all, > > when we're running a pig job for aggregating some amount of slightly > compressed avro data (~160GByte), the time until the first actual mapred > job starts takes ages: > 15:27:21,052 [main] INFO org.apache.pig.Main - Logging error messages > to: > [...] > 15:57:27,816 [main] INFO org.apache.pig.tools.pigstats.ScriptState - > Pig features used in the script: > [...] > 16:07:00,969 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > [...] > 16:07:30,886 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=463325937621 > [...] > 16:15:38,022 [Thread-16] INFO > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input > paths to process : 50353 > > This log messages are from our test cluster which has a dedicated > jobtracker and namenode each and 5 data nodes with a map task capacity > of 15 and a reduce task capacity of 10. There were 6899 map tasks and > 464 reduce tasks set up. > > During the initialisation phase we were observing the work load and > memory usage of jobtracker, namenode and some data nodes using top. > Those were nearly all the time kind of bored (e.g. 30% cpu load on the > namenode, total idle on he data nodes). When the jobs were running most > of the tasks where in "waiting for IO" most of the time. It seemed there > was some swapping space reserved but rarely used in those times. > > In our eyes it looks like a hadoop config issue but we have no idea what > it exaclty could be. Jobs with about 10GBytes of input data were running > quite well. > > Any hint where to tweak will be appreciated > > Thanks > Markus > >
