Hi Todd, It depends what kind of hardware you run this on.
4 instances of 15GB shard 1 @ machines 1-4 behind VIP1 4 instances of 15GB shard 2 @ machines 5-8 behind VIP2 4 instances of 15GB shard 3 @ machines 9-12 behind VIP3 4 instances of 15GB shard 4 @ machines 13-16 behind VIP4 This is what you are building, right? That sounds fine, though you may be able to get away with even just 2 boxes per shard (e.g. machines 1-2 for shard 1 instead of 4 machines), depending on the query rate and their actual latency. In that case you could break the index into even smaller shards, thus lowering your per-machine RAM requirements: 2 instances of 7.5GB shard 1 @ machines 1-2 behind VIP1 2 instances of 7.5GB shard 2 @ machines 3-4 behind VIP2 2 instances of 7.5GB shard 3 @ machines 5-6 behind VIP3 2 instances of 7.5GB shard 4 @ machines 7-8 behind VIP4 2 instances of 7.5GB shard 5 @ machines 9-10 behind VIP5 2 instances of 7.5GB shard 6 @ machines 11-12 behind VIP6 2 instances of 7.5GB shard 7 @ machines 13-14 behind VIP7 2 instances of 7.5GB shard 8 @ machines 15-16 behind VIP88 You should be okay with 8GB of RAM per machine, or even 4 GB perhaps. It depends how much heap you give to the JVM, how big your Solr caches are, etc. And don't forget you can always put a caching HTTP proxy in front of Solr! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Todd Benge <todd.be...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Monday, May 4, 2009 6:35:47 PM > Subject: Distributed Sizing Question > > Hi, > > We're in the process of converting a Lucene deployment of 500 M documents / > 60 G deployment into a Solr Distributed search deployment. > > The primary reason for the change is instability in the Lucene deployment > due to memory constraints. The existing infrastructure is deployed on 48 > machines with all indices on each machine using the MultiSearcher. > > The new deployment will leverage Solr's distributed search model for to > deploy smaller index shards in various clusters. Under average load, the > system should be able to easily handle 8-10 requests per second. > > We're looking for some guidance on best practices for sizing the cluster's > correctly. Our current thought is to divide the indices into 4 equal parts > and build several 4 machine clusters. So each machine will be hosting ~ 15 > G each. > > Has anyone had experience with a similar size deployment? Any suggestions > on the architectural strategy? > > Thanks for the help. > > Todd