On 4/25/2014 1:48 PM, Ed Smiley wrote:
> Anyone with experience, suggestions or lessons learned in the 10 -100 TB 
> scale they'd like to share?
> Researching optimum design for a Solr Cloud with, say, about 20TB index.

You've gotten some good information already in the replies that have
come your way.  The following blog post is even more relevant (in the
"we don't know" department) for large indexes than it is for small indexes:

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

My own index is nowhere near that size.  It has 95 million records and
seven shards.  A single copy is about 108GB and lives on two servers
that each have 64GB of RAM.  I'm not running in SolrCloudmode.

The most important resource for Solr scalability is RAM.  This includes
the Java heap on each server, as well as unallocated memory so the
operating system can cache the index data that lives on that server.

http://wiki.apache.org/solr/SolrPerformanceProblems

As the wiki page says, ideally you'd want as much RAM for the OS disk
cache as the index takes up on disk, but 40TB of RAM across all servers
just for the OS disk cache (in addition to whatever you need for the
java heap) is too expensive to contemplate.  A 1:1 ratio is not an
absolute requirement, although it does produce the best results.

For that 40TB ideal figure, I am assuming that you mean a single replica
of your index would be 20TB, and that you'd have two.

Doing everything you can to reduce the index size will go a long way
towards improving Solr performance.  Having SSD in each server for the
index data would also help.  If the query volume is high, a large number
of very fast CPU cores is also required.

Thanks,
Shawn

Reply via email to