How do you figure that it takes 1.5G ram for 30M pages? I believe that when the Lucene indexes are read, it reads all the numbered *.f* files and the *.tii files into memory. The numbered *.f* files contain the length normalization values for each indexed field (1 byte per doc), and the .tii file contains every kth term (k=128 by default, I think).
For 30M documents, each *.f* file is 30 megs, and your .tii file should be less than 100 megs. For 8 indexed fields, you'd be looking at a memory footprint of about 340M. Any extra memory on the server can be used for buffer caching which will speed up searches. If you'd like, you can set up search servers to spread the load across seperate machines. The servlet container you use shouldn't make much of a difference in memory usage. Andy On 8/2/05, Jay Pound <[EMAIL PROTECTED]> wrote: > I'm testing an index of 30 million pages, it requires 1.5gb of ram to search > using tomcat 5, I plan on having an index with multiple billion pages, but > if this is to scale then even with 16GB of ram I wont be able to have an > index larger than 320million pages? how can I distribute the memory > requirements across multiple machines, or is there another servlet program > (like resin) that will require less memory to operate, has anyone else run > into this? > Thanks, > -Jay Pound > > > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
