On Fri, 2014-06-06 at 12:32 +0200, Vineet Mishra wrote: > *Does that mean for querying smoothly we need to have memory atleast equal > or greater to the size of index?
If you absolutely, positively have to reduce latency as much as possible, then yes. With an estimated index size of 2TB, I would guess that 10-20 machines with powerful CPUs (1 per shard per expected concurrent request) would also be advisable. While you're at it, do make sure that you're using high-speed memory. That was not a serious suggestion, should you be in doubt. Very few people need the best latency possible. Most just need the individual searches to be "fast enough" and want to scale throughput instead. > As in my case the index size will be very heavy(~2TB) and practically > speaking that amount of memory is not possible. Even If it goes to > multiple shards, say around 10 Shards then also 200GB of RAM will not > be an feasible option. We're building a projected 24TB index collection and are currently at 2.7TB+, growing with about 1TB/10 days. Our current plan is to use a single machine with 256GB of RAM, but we will of course adjust along the way if it proves to be too small. Requirements differ with the corpus and the needs, but for us, SSDs as storage seems to provide quite enough of a punch. I did a little testing yesterday: https://plus.google.com/u/0/+TokeEskildsen/posts/4yPvzrQo8A7 tl;dr: for small result sets (< 1M hits) on unwarmed searches with simple queries, response time is below 100ms. If we enable faceting with plain Solr, this jumps to about 1 second. I did a top on the machine and it says that 50GB is currently used for caching, so an 80GB (and probably less) machine would work fine for our 2.7TB index. - Toke Eskildsen, State and University Library, Denmark