Re: Slow reduce>copy

Mathijs Homminga Thu, 16 Aug 2007 10:30:26 -0700

I have the same problem.

In my case it was caused by the 5 seconds delay in Hadoop'sReduceTaskRunner:


private static final long MIN_POLL_INTERVAL = 5000;

This is the time the ReduceTaskRunner sleeps between successive pollsfor new map outputs (another question: why does the prepare() methodkeep polling for new map outputs, even when all outputs are known?)When your map output files are relatively small, this 5 sec delaybecomes significant.


Proposed solution: make your map output files larger.

Or you can modify Hadoop and decrease this delay. But be careful, if youset it too low, the polling overhead might become too large.


Good luck,
Mathijs

Nguyen Manh Tien wrote:

I setup nutch with hadoop run on several PC.
when it run, i find that the reduce task run very slow at the speed of
0.01MB/s ("reduce > copy (9 of 10 at 0.01 MB/s)")
Any one help me.

Re: Slow reduce>copy

Reply via email to