I've a mapreduce job that is taking too long..over an hour..Trying to see what can a tune to to bring it down..One thing I noticed, the job is kicking off - 500+ map tasks : 490 of them do not process any records..where as 10 of them process all the records (200 K each..)..Any idea why that would be?... ..map phase takes about couple of minutes.. ..reduce phase takes the rest..
..i'll try increasing # of reduce tasks..Open to other other suggestion for tunables.. thanks for your input venkatesh
