We're in the process of moving from 12 single-core collections
(non-cloud Solr) on 3 VMs to a SolrCloud setup. Our collections aren't
huge, ranging in size from 50K to 150K documents with one at 1.2M docs.
Our max query frequency is rather low .. probably no more than
10-20/min. We do update frequently, maybe 10-100 documents every 10 mins.
Our prototype setup is using 3 VMs (4 core, 16GB RAM each), and we've
got each collection split into 2 shards with 3 replicas (one per VM).
Also, Zookeeper is running on each VM. I understand that it's best to
have each ZK server on a separate machine, but hoping this will work for
This all seemed like a good place to start, but after reading lots of
articles and posts, I'm thinking that maybe our smaller collections
(under 100K docs) should just be one shard each, and maybe the 1.2M
collection should be more like 6 shards. How do you decide how many
shards is right?
Also, our current live system is separated into dev/stage/prod tiers,
not, all of these tiers are together on each of the cloud VMs. This
bothers some people, thinking that it may make our production
environment less stable. I know that in an ideal world, we'd have them
all on separate systems, but with the replication, it seems like we're
going to make the overall system more stable. Is this a correct
I'm just wondering anyone has opinions on whether we're going in a
reasonable direction or not. Are there any articles that discuss these
initial sizing/scoping issues?
- Scoping SolrCloud setup Scott Prentice