Hi Todd,

It depends what kind of hardware you run this on.

4 instances of 15GB shard 1 @ machines 1-4 behind VIP1
4 instances of 15GB shard 2 @ machines 5-8 behind VIP2
4 instances of 15GB shard 3 @ machines 9-12 behind VIP3
4 instances of 15GB shard 4 @ machines 13-16 behind VIP4

This is what you are building, right? That sounds fine, though you may be able 
to get away with even just 2 boxes per shard (e.g. machines 1-2 for shard 1 
instead of 4 machines), depending on the query rate and their actual latency.  
In that case you could break the index into even smaller shards, thus lowering 
your per-machine RAM requirements:

2 instances of 7.5GB shard 1 @ machines 1-2 behind VIP1
2 instances of 7.5GB shard 2 @ machines 3-4 behind VIP2
2 instances of 7.5GB shard 3 @ machines 5-6 behind VIP3
2 instances of 7.5GB shard 4 @ machines 7-8 behind VIP4
2 instances of 7.5GB shard 5 @ machines 9-10 behind VIP5
2 instances of 7.5GB shard 6 @ machines 11-12 behind VIP6
2 instances of 7.5GB shard 7 @ machines 13-14 behind VIP7
2 instances of 7.5GB shard 8 @ machines 15-16 behind VIP88

You should be okay with 8GB of RAM per machine, or even 4 GB perhaps.  It 
depends how much heap you give to the JVM, how big your Solr caches are, etc.  
And don't forget you can always put a caching HTTP proxy in front of Solr!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Todd Benge <todd.be...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, May 4, 2009 6:35:47 PM
> Subject: Distributed Sizing Question
> 
> Hi,
> 
> We're in the process of converting a Lucene deployment of 500 M documents /
> 60 G deployment into a Solr Distributed search deployment.
> 
> The primary reason for the change is instability in the Lucene deployment
> due to memory constraints.  The existing infrastructure is deployed on 48
> machines with all indices on each machine using the MultiSearcher.
> 
> The new deployment will leverage Solr's distributed search model for to
> deploy smaller index shards in various clusters.  Under average load, the
> system should be able to easily handle 8-10 requests per second.
> 
> We're looking for some guidance on best practices for sizing the cluster's
> correctly.  Our current thought is to divide the indices into 4 equal parts
> and build several 4 machine clusters.  So each machine will be hosting ~ 15
> G each.
> 
> Has anyone had experience with a similar size deployment?  Any suggestions
> on the architectural strategy?
> 
> Thanks for the help.
> 
> Todd

Reply via email to