I attended the Webinar on March 4th. Many thanks to Yonik for putting
that on. That has led to some questions about the best way to bring
fault tolerance to our distributed search. High level question: Should
I go with SolrCloud, or stick with 1.4 and use load balancing? I hope
the rest of this email isn't too disjointed for understanding.
We are using virtual machines on 8-core servers with 32GB of RAM to
house all this. For initial deployment, there are two of these, but we
will have a total of four once we migrate off our current indexing
solution. We won't be able to bring fault tolerance into the mix until
we have all four hosts, but I need to know what direction we are going
before initial deployment.
One choice is to stick with version 1.4 for stability and use load
balancing on the shards. I had already planned to have a pair of load
balancer VMs to handle redundancy on what I'm calling the broker
(explained further down), so it would not be a major step to have it do
the shards as well.
I have been looking into SolrCloud. I tried to just swap out the .war
file with one compiled from the cloud branch, but that didn't work. A
little digging showed that the cloud branch uses a core for the
collection. I already have cores defined so I can build indexes and
swap them into place quickly. A big question - can I continue to use
this multi-core approach with SolrCloud, or does it supplant cores with
its collection logic?
Due to the observed high CPU requirements involved in sorting results
from multiple shards into a final result, I have so far opted to go with
an architecture that puts an empty index into a broker core, which lives
on its own VM host separate from the large static shards. This core's
solrconfig.xml has a list of all the shards that get queried. My
application has no idea that it's talking to anything other than a
single SOLR instance. Once we get the caches warmed, performance is
quite good.
The VM host with the broker will also have another VM with the shard
where all new data goes, a concept we call the incrememental. On a
nightly basis, some of the documents in the incremental will be
redistributed to the static shards and everything will get reoptimized.
How would you recommend I pursue fault tolerance? I had already planned
to set up a load balancer VM to handle redundancy for the broker, so it
would not be a HUGE step to have it load balance the shards too.
- Distributed search fault tolerance Shawn Heisey
-