On Tue, Oct 28, 2008 at 12:12 PM, Julien Nioche
<[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> I am running a Fetch task on an EC2 cluster. The Map part is reasonably fast
> but the Reduce is taking forever. I see no explicit Reducer specified for
> the Job so I assume that the output of the reduce is simply copied to HDFS.
> Since all the DataNodes are on EC2 I imagine that the cost of duplicating
> the data is not too high.
>

Are you parsing during fetching? If you are ParseOutputFormat runs during
reduce and that may be the slow part (because without parsing, fetch-reduce is
just identity reduce)

> I had a look at the EC2 instance doing the reduction : the CPU is at 40
> something percent and there is no RAM available (most of it being used by
> the TaskTracker and DataNode).
>
> Any idea of why it is so slow? Are there any parameters which could
> influence the performance?
>
> Thanks for your help
>
> Julien
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>



-- 
Doğacan Güney

Reply via email to