Stefan Groschupf wrote:
http://wiki.apache.org/nutch/Presentations
Can you explan what this means: Page 20:
- cheduling is bottleneck, not disk, network or CPU?
I mean that neither the CPUs, disks or network are at 100% of capacity.
Disks are running around 50% busy, CPUs a bit higher, and the network
switch has lots of bandwidth left. (Although, if we used multiple racks
connected with gigabit links, these inter-rack links would already be
near capacity.) So sometimes the CPU is busy generating random data and
stuffing it in a buffer, and sometimes the disk is busy writing data,
but we're not keeping both busy at the same time all the time. Perhaps
if more threads/processes and/or bigger buffers would increase the
utilization--I have not tried to tune things for this benchmark. But I
am not dissapointed with this performance. Rather, I think that it is
fast enough so that with real applications, with non-trival map and
reduce functions, NDFS will not be a bottleneck.
Doug
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers