I am in the process of deciding specs for a crawling machine and a
searching machine (two machines), which will support merging/indexing
and searching operations on a single index that may scale to about
several million pages (at which it would be about 2-10 GB, assuming
linear growth with pages).
There are some rule of thumbs in the wiki that are still up to date for the index itself sine they are more related to lucene. In general my suggestion is start small and grow with your need, hadoop is perfect for that.

What is the range of hardware that I should be looking at? Could
anyone share their deployment/hardware specs for a large index size?
I'm looking for RAM and CPU considerations.

Also what is the preferred platform - Java has a max memory allocation
of 4GB on Solaris and 2GB on linux? -> Does it make sense to get more
RAM than this?

As far I know you should be able to use more memory with a 64bit jvm.


Reply via email to