I have the same problem.
In my case it was caused by the 5 seconds delay in Hadoop's ReduceTaskRunner:

private static final long MIN_POLL_INTERVAL = 5000;

This is the time the ReduceTaskRunner sleeps between successive polls for new map outputs (another question: why does the prepare() method keep polling for new map outputs, even when all outputs are known?) When your map output files are relatively small, this 5 sec delay becomes significant.

Proposed solution: make your map output files larger.
Or you can modify Hadoop and decrease this delay. But be careful, if you set it too low, the polling overhead might become too large.

Good luck,
Mathijs

Nguyen Manh Tien wrote:
I setup nutch with hadoop run on several PC.
when it run, i find that the reduce task run very slow at the speed of
0.01MB/s ("reduce > copy (9 of 10 at 0.01 MB/s)")
Any one help me.

Reply via email to