What additional rules of thumb exist beyond the 20M pages per node
threshold -- i.e. when distributed search becomes necessary?

Thanks,
DaveG

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 17, 2005 1:38 PM
To: [email protected]
Subject: Re: Nutch Search Speed Concern

Murray Hunter wrote:
> We tested search for a 20 Million page index on a dual core 64 bit
machine
> with 8 GB of ram using storage of the nutch data on another server
through
> linux nfs, and it's performance was terrible. It looks like the
bottleneck
> was nfs, so I was wondering how you had your storage set up.  Are you
using
> NDFS, or is it split up over multiple servers?

For good search performance, indexes and segments should always reside 
on local volumes, not in NDFS and not in NFS.  Ideally these can be 
spread across the available local volumes, to permit more parallel disk 
i/o.  As a rule of thumb, searching starts to get slow with more than 
around 20M pages per node.  Systems larger than that should benefit from

distributed search.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to