On 6/8/2014 4:17 PM, shushuai zhu wrote:
> I would like to get some advice to setup a Solr Cloud on a set of powerful 
> machines. The average size of the documents handled by the Solr Cloud is 
> about 0.5 KB, and the number of documents stored in Solr Cloud could reach 
> billions. When indexing, the incoming document rate could be as high as 
> 20k/second; and the major query operations performed on the Cloud are 
> searching, faceting, and some other aggregations. There will NOT be many 
> concurrent queries (replication factor of 2 may be good enough), but some 
> queries could cover big range of documents.
>
> As an example, I have 8 powerful machines (nodes), and each machine (node) 
> has:
>
> 16 CPU cores
> 256GB RAM
> 48TB physical disk space
>
> The Solr Cloud may be setup in following different ways (assuming replication 
> factor is 2):
>
> 1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
> Each machine (node) holds one Solr server (JVM), and each Solr server has one 
> shard. 
>
> 2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
> Each machine (node) holds one Solr server (JVM), and each Solr server has 4 
> shards. 
>
> 3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
> Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 
> shards.
>
> 4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
> Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 
> shards.
>
> 5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
> Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 
> shards.

Erick's note is very important.  From the information given, we can't
even guess about the size of your index.  Even if we had that
information, there are too many variables to give you any real
recommendations.

Also mentioned by Erick:  RAM is the single greatest factor affecting
Solr performance.  If you have enough OS disk cache to fit your index
entirely in RAM, performance is likely to be excellent.  With 256GB of
RAM on eight servers, you're going to have about 2TB of RAM, some of
which will be used for Solr itself.  If both copies of your index take
up 2TB or less in disk space, you're probably going to be OK there. 
You'd probably be OK up to about 3TB of total index.

The 48TB of disk space is probably serious overkill.  I would assume
this is twelve 4TB drives.  It would be better for performance (without
losing redundancy) to use RAID10 with a stripe size of at least 1MB for
the storage instead of any other RAID level.  It eats up half your raw
space for redundancy, but the performance is *excellent*.

The fact that your query volume will be low does give me the ability to
tell you one thing: With 16 CPU cores per machine and a low query
volume, you'll be able to handle a lot more Solr cores per machine.  The
extra CPU cores can spend their time reading from Solr cores and
speeding up each individual query without worrying about being crushed
under hundreds of queries per second.

For a perfect match of CPU cores to Solr cores, you'd do option number
4, so each machine would get 16 Solr cores ... but I think option number
3 might be better, so you have more CPUs than indexes per machine.  This
gives you a safe capacity of about 32 billion documents, with a maximum
total capacity of well over 64 billion documents.

Thanks,
Shawn

Reply via email to