On Mon, 2005-10-03 at 13:12 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I have high load, but it seems that the percentage progress progress
> > during the reduce > sort phase of fetch (parse?) is not increasing which
> > makes it appear as if nothing is happening (stuck at 0.5, or 50%).
> 
> That's correct.  There are currently no progress reports during sorting. 
>   Reduce progress sticks at 50% during sorting, and jumps to 75% on 
> completion of the sort phase.

I see. Is there any way to speed up this phase? It seems to be taking as
long to run the sort phase as it did to download the data.

It would appear that nearly 30% of the time for the nutch fetch segment
is spent doing the sorts, so I'm well off the 20% overhead number you
seem to be able to achieve for a full cycle.

5 machines (4CPU) each with 8 tasks with a load average is about 5 and
they run Redhat. Context switches are low (under 1500/second). There is
virtually no IO (boxes have plenty of ram) but the kernel is doing a
bunch of work as 50% of CPU time is in system (unsure what, I'm not
familiar with the Linux DTrace type tools).

I generated the segment for the top 10Million pages, with 10 pages per
host.  map.tasks=383, reduce.tasks=43

-- 
Rod Taylor <[EMAIL PROTECTED]>

Reply via email to