Stefan Groschupf wrote:
http://wiki.apache.org/nutch/Presentations

Can you explan what this means: Page 20:
- cheduling is bottleneck, not disk, network or CPU?

I mean that neither the CPUs, disks or network are at 100% of capacity. Disks are running around 50% busy, CPUs a bit higher, and the network switch has lots of bandwidth left. (Although, if we used multiple racks connected with gigabit links, these inter-rack links would already be near capacity.) So sometimes the CPU is busy generating random data and stuffing it in a buffer, and sometimes the disk is busy writing data, but we're not keeping both busy at the same time all the time. Perhaps if more threads/processes and/or bigger buffers would increase the utilization--I have not tried to tune things for this benchmark. But I am not dissapointed with this performance. Rather, I think that it is fast enough so that with real applications, with non-trival map and reduce functions, NDFS will not be a bottleneck.

Doug


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to