Re: mapred Sort Progress Reports

Rod Taylor Mon, 03 Oct 2005 15:04:38 -0700

For comparison, here are some IO numbers while reduce > reduce is
running:


        avg-cpu:  %user   %nice    %sys %iowait   %idle
                  72.22    0.00    5.67   14.85    7.25
        
        Device:            tps    kB_read/s    kB_wrtn/s    kB_read
        kB_wrtn
        sda             473.77     12363.16     13399.40     123508
        133860
        
        avg-cpu:  %user   %nice    %sys %iowait   %idle
                  73.72    0.00   10.73    9.60    5.95
        
        Device:            tps    kB_read/s    kB_wrtn/s    kB_read
        kB_wrtn
        sda             445.11      3380.44     23845.91      33872
        238936
        

On Mon, 2005-10-03 at 14:11 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I see. Is there any way to speed up this phase? It seems to be taking as
> > long to run the sort phase as it did to download the data.
> > 
> > It would appear that nearly 30% of the time for the nutch fetch segment
> > is spent doing the sorts, so I'm well off the 20% overhead number you
> > seem to be able to achieve for a full cycle.
> > 
> > 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> > they run Redhat. Context switches are low (under 1500/second). There is
> > virtually no IO (boxes have plenty of ram) but the kernel is doing a
> > bunch of work as 50% of CPU time is in system (unsure what, I'm not
> > familiar with the Linux DTrace type tools).
> 
> Sorting is usually i/o bound on mapred.local.dir.  When eight tasks are 
> using the same device this could become a bottleneck.  Use iostat or sar 
> to view disk i/o statistics.
> 
> My plan is to permit one to specify a list of directories for 
> mapred.local.dir and have the sorting (and everything else) select 
> randomly among these for temporary local files.  That way all devices 
> can be used in parallel.
> 
> As a workaround you could try starting eight tasktrackers, each 
> configured with a different device for mapred.local.dir.  Yes, that's a 
> pain, but it would give us an idea of whether my analysis is correct.
> 
> Doug
> 
-- 
Rod Taylor <[EMAIL PROTECTED]>

Re: mapred Sort Progress Reports

Reply via email to