Re: Solr maximum Optimal Index Size per Shard

Toke Eskildsen Fri, 06 Jun 2014 04:35:27 -0700

On Fri, 2014-06-06 at 12:32 +0200, Vineet Mishra wrote:
> *Does that mean for querying smoothly we need to have memory atleast equal
> or greater to the size of index?


If you absolutely, positively have to reduce latency as much as
possible, then yes. With an estimated index size of 2TB, I would guess
that 10-20 machines with powerful CPUs (1 per shard per expected
concurrent request) would also be advisable. While you're at it, do make
sure that you're using high-speed memory.

That was not a serious suggestion, should you be in doubt. Very few
people need the best latency possible. Most just need the individual
searches to be "fast enough" and want to scale throughput instead.

> As in my case the index size will be very heavy(~2TB) and practically
> speaking that amount of memory is not possible. Even If it goes to
> multiple shards, say around 10 Shards then also 200GB of RAM will not
> be an feasible option.

We're building a projected 24TB index collection and are currently at
2.7TB+, growing with about 1TB/10 days. Our current plan is to use a
single machine with 256GB of RAM, but we will of course adjust along the
way if it proves to be too small.

Requirements differ with the corpus and the needs, but for us, SSDs as
storage seems to provide quite enough of a punch. I did a little testing
yesterday: https://plus.google.com/u/0/+TokeEskildsen/posts/4yPvzrQo8A7

tl;dr: for small result sets (< 1M hits) on unwarmed searches with
simple queries, response time is below 100ms. If we enable faceting with
plain Solr, this jumps to about 1 second.

I did a top on the machine and it says that 50GB is currently used for
caching, so an 80GB (and probably less) machine would work fine for our
2.7TB index.


- Toke Eskildsen, State and University Library, Denmark

Re: Solr maximum Optimal Index Size per Shard

Reply via email to