Distributed search fault tolerance

Shawn Heisey Tue, 09 Mar 2010 13:29:16 -0800

I attended the Webinar on March 4th. Many thanks to Yonik for puttingthat on. That has led to some questions about the best way to bringfault tolerance to our distributed search. High level question: ShouldI go with SolrCloud, or stick with 1.4 and use load balancing? I hopethe rest of this email isn't too disjointed for understanding.

We are using virtual machines on 8-core servers with 32GB of RAM tohouse all this. For initial deployment, there are two of these, but wewill have a total of four once we migrate off our current indexingsolution. We won't be able to bring fault tolerance into the mix untilwe have all four hosts, but I need to know what direction we are goingbefore initial deployment.

One choice is to stick with version 1.4 for stability and use loadbalancing on the shards. I had already planned to have a pair of loadbalancer VMs to handle redundancy on what I'm calling the broker(explained further down), so it would not be a major step to have it dothe shards as well.

I have been looking into SolrCloud. I tried to just swap out the .warfile with one compiled from the cloud branch, but that didn't work. Alittle digging showed that the cloud branch uses a core for thecollection. I already have cores defined so I can build indexes andswap them into place quickly. A big question - can I continue to usethis multi-core approach with SolrCloud, or does it supplant cores withits collection logic?

Due to the observed high CPU requirements involved in sorting resultsfrom multiple shards into a final result, I have so far opted to go withan architecture that puts an empty index into a broker core, which liveson its own VM host separate from the large static shards. This core'ssolrconfig.xml has a list of all the shards that get queried. Myapplication has no idea that it's talking to anything other than asingle SOLR instance. Once we get the caches warmed, performance isquite good.

The VM host with the broker will also have another VM with the shardwhere all new data goes, a concept we call the incrememental. On anightly basis, some of the documents in the incremental will beredistributed to the static shards and everything will get reoptimized.

How would you recommend I pursue fault tolerance? I had already plannedto set up a load balancer VM to handle redundancy for the broker, so itwould not be a HUGE step to have it load balance the shards too.

Distributed search fault tolerance

Reply via email to