Reduce part of a Fetch task

Julien Nioche Tue, 28 Oct 2008 03:13:33 -0700

Hi guys,

I am running a Fetch task on an EC2 cluster. The Map part is reasonably fast
but the Reduce is taking forever. I see no explicit Reducer specified for
the Job so I assume that the output of the reduce is simply copied to HDFS.
Since all the DataNodes are on EC2 I imagine that the cost of duplicating
the data is not too high.


I had a look at the EC2 instance doing the reduction : the CPU is at 40
something percent and there is no RAM available (most of it being used by
the TaskTracker and DataNode).

Any idea of why it is so slow? Are there any parameters which could
influence the performance?

Thanks for your help

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reduce part of a Fetch task

Reply via email to