Re: legacy replication

Walter Underwood Fri, 15 Dec 2017 10:16:00 -0800

I love legacy replication. It is simple and bulletproof. Loose coupling for the 
win! We only run Solr Cloud when we need sharding or NRT search. Loose coupling 
is a very, very good thing in distributed systems.

Adding a replica (new slave) is trivial. Clone an existing one. This makes 
horizontal scaling so easy. We still haven’t written the procedure and scripts 
for scaling our Solr Cloud cluster. Last time, it was 100% manual through the 
admin UI.

Setting up a Zookeeper ensemble isn’t as easy as it should be. We tried to set 
up a five node ensemble with ZK 3.4.6 and finally gave up after two weeks 
because it was blocking the release. We are using the three node 3.4.5 ensemble 
that had been set up for something else a couple of years earlier. I’ve had 
root on Unix since 1981 and have been running TCP/IP since 1983, so I should 
have been able to figure this out.

We’ve had some serious prod problems with the Solr Cloud cluster, like cores 
stuck in a permanent recovery loop. I finally manually deleted that core and 
created a new one. Ugly.

Even starting Solr Cloud processes is confusing. It took a while to figure out 
they were all joining as the same host (no, I don’t know why), so now we start 
them as: solr start -cloud -h `hostname`

Keeping configs under source control and deploying them isn’t easy. I’m not 
going to install Solr on the Jenkins executor just so it can deploy, that is 
weird and kind of a chicken and egg thing. I ended up writing a Python program 
to get the ZK address from the cluster, use kazoo to load directly to ZK, then 
tell the cluster to reload. Both with that and with the provided ZK tools I ran 
into so much undocumented stuff. What is linking? How to the file config 
directories map to the ZK config directories? And so on.

The lack of a thread pool for requests is a very serious problem. If our 6.5.1 
cluster gets overloaded, it creates 4000 threads, runs out of memory and fails. 
That is just wrong. With earlier versions of Solr, it would get slower and 
slower, but recover gracefully.

Converting a slave into a master is easy. We use this in the config file:

   <lst name="master">
      <str name="enable">${enable.master:false}</str>
  …
  <lst name="slave">
     <str name="enable">${textbooks.enable.slave:false}</str>

And this at startup (slave config shown): -Denable.master=false 
-Denable.slave=true

Change the properties and restart.

Our 6.5.1 cluster is faster than the non-sharded 4.10.4 master/slave cluster, 
but I’m not happy with the stability in prod. We’ve had more search outages in 
the past six months than we had in the previous four years. I’ve had Solr in 
prod since version 1.2, and this is the first time it has really embarrassed me.

There are good things. Search is faster, we’re handling double the query volume 
with 3X the docs.

Sorry for the rant, but it has not been a good fall semester for our students 
(customers).

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 15, 2017, at 9:46 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> There's pretty much zero chance that it'll go away, too much current
> and ongoing functionality that depends on it.
> 
> 1> old-style replication has always been used for "full sync" in
> SolrCloud when peer sync can't be done.
> 
> 2> The new TLOG and PULL replica types are a marriage of old-style
> master/slave and SolrCloud. In particular a PULL replica is
> essentially an old-style slave. A TLOG replica is an old-style slave
> that also maintains a transaction log so it can take over leadership
> if necessary.
> 
> Best,
> Erick
> 
> On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
> <hastings.recurs...@gmail.com> wrote:
>> So i dont step on the other thread, I want to be assured whether or not
>> legacy master/slave/repeater replication will continue to be supported in
>> future solr versions.  our infrastructure is set up for this and all the HA
>> redundancies that solrcloud provides we have already spend a lot of time
>> and resources with very expensive servers to handle solr in standalone
>> mode.
>> 
>> thanks.
>> -David

Re: legacy replication

Reply via email to