There are a lot of parameters, so there's not a simple answer. In general, fetching and database updates require lots of disk, and searching is faster with more RAM. But the particulars depend on how big of an index you're trying to build and how much query traffic you expect.
As a general rule, each page fetched requires around 10k of disk overall (for the page cache, its text, the index, db entries, etc.). So a terabyte of storage is required for every 100M pages.
With your current configuration, you could use the xeon for fetching and database work. Then copy sets of segments, with merged indexes, to your five xeon servers. With this hardware you should be able to easily maintain a 100+M page collection that can serve several searches per second.
Doug
Byron Miller wrote:
I've been reading the threads and trying to get a handle on the current distributed platform and wanted to get some feedback and help.
We are building out our new rack and we have 5 servers and about 2 terrabytes of disk space right now. We have a core xeon with 1.4 terrabytes that we are using right now and the rest are p4's with 200 gigs a piece and 1 gig of memory (will be expanded).
Is there an optimial way to configure this? how is the current nutch systems done?
Are you doing distributed DB or are you allocating indexes on each query server and have a core db server for now?
What would be your recommendations and is there anymore whitepapers on this?
Thanks in advance!
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
