[Nutch-dev] Re: Memory usage

Andy Liu Tue, 02 Aug 2005 10:52:17 -0700

How do you figure that it takes 1.5G ram for 30M pages?  I believe
that when the Lucene indexes are read, it reads all the numbered *.f*
files and the *.tii files into memory.  The numbered *.f* files
contain the length normalization values for each indexed field (1 byte
per doc), and the .tii file contains every kth term (k=128 by default,
I think).


For 30M documents, each *.f* file is 30 megs, and your .tii file
should be less than 100 megs.  For 8 indexed fields, you'd be looking
at a memory footprint of about 340M.  Any extra memory on the server
can be used for buffer caching which will speed up searches.

If you'd like, you can set up search servers to spread the load across
seperate machines.

The servlet container you use shouldn't make much of a difference in
memory usage.

Andy

On 8/2/05, Jay Pound <[EMAIL PROTECTED]> wrote:
> I'm testing an index of 30 million pages, it requires 1.5gb of ram to search
> using tomcat 5, I plan on having an index with multiple billion pages, but
> if this is to scale then even with 16GB of ram I wont be able to have an
> index larger than 320million pages? how can I distribute the memory
> requirements across multiple machines, or is there another servlet program
> (like resin) that will require less memory to operate, has anyone else run
> into this?
> Thanks,
> -Jay Pound
> 
> 
>


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: Memory usage

Reply via email to