Doug, With that said, is there any information/papers on federating the searches over a cluster of servers? I'm assuming that you would have unique segments on each server so you could then load as much of the index into memory as possible for best performance.
Is there more info on this in the lucene messages boards that i should look into? thanks again for all your help, i'll have to buy you guys a beer one of these days! :) -byron --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Byron, > > There are a lot of parameters, so there's not a > simple answer. In > general, fetching and database updates require lots > of disk, and > searching is faster with more RAM. But the > particulars depend on how > big of an index you're trying to build and how much > query traffic you > expect. > > As a general rule, each page fetched requires around > 10k of disk overall > (for the page cache, its text, the index, db > entries, etc.). So a > terabyte of storage is required for every 100M > pages. > > With your current configuration, you could use the > xeon for fetching and > database work. Then copy sets of segments, with > merged indexes, to your > five xeon servers. With this hardware you should be > able to easily > maintain a 100+M page collection that can serve > several searches per second. > > Doug > > Byron Miller wrote: > > I've been reading the threads and trying to get a > > handle on the current distributed platform and > wanted > > to get some feedback and help. > > > > We are building out our new rack and we have 5 > servers > > and about 2 terrabytes of disk space right now. > We > > have a core xeon with 1.4 terrabytes that we are > using > > right now and the rest are p4's with 200 gigs a > piece > > and 1 gig of memory (will be expanded). > > > > Is there an optimial way to configure this? how is > the > > current nutch systems done? > > > > Are you doing distributed DB or are you allocating > > indexes on each query server and have a core db > server > > for now? > > > > What would be your recommendations and is there > > anymore whitepapers on this? > > > > Thanks in advance! > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux > Tutorials > > Free Linux tutorial presented by Daniel Robbins, > President and CEO of > > GenToo technologies. Learn everything from > fundamentals to system > > > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > > _______________________________________________ > > Nutch-developers mailing list > > [EMAIL PROTECTED] > > > https://lists.sourceforge.net/lists/listinfo/nutch-developers > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux > Tutorials > Free Linux tutorial presented by Daniel Robbins, > President and CEO of > GenToo technologies. Learn everything from > fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > Nutch-developers mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-developers ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
