I'm at the point now where I may end up writing a script to compare master/slave nightly...and trigger an optimize or solr restart if there are any differences. Of course I have to check 150+ cores...but it could be done. I'm just hoping I don't need to go that route....
-----Original Message----- From: David Hastings [mailto:hastings.recurs...@gmail.com] Sent: Friday, January 19, 2018 10:35 AM To: solr-user@lucene.apache.org Subject: Re: Solr Replication being flaky (6.2.0) This happens to me quite often as well. Generally on the replication admin screen it will say its downloading a file, but be at 0 or a VERY small kb/sec. Then after a restart of the slave its back to downloading at 30 to 100 mg/sec. Would be curious if there actually is a solution to this aside from checking every day if the core replicated. Im on Solr 5.x by the way -Dave On Fri, Jan 19, 2018 at 9:50 AM, Pouliot, Scott < scott.poul...@peoplefluent.com> wrote: > So we're running Solr in a Master/Slave configuration (1 of each) and > it seems that the replication stalls or stops functioning every now > and again. If we restart the Solr service or optimize the core it > seems to kick back in again. > > Anyone have any idea what might be causing this? We do have a good > amount of cores on each server (@150 or so), but I have heard reports > of a LOT more than that in use. > > Here is our master config: > <requestHandler name="/replication" class="solr.ReplicationHandler" > > <lst name="master"> > <!--Replicate on 'startup' and 'commit'. 'optimize' is also a > valid value for replicateAfter. --> > <str name="replicateAfter">startup</str> > <str name="replicateAfter">commit</str> > > <!--The default value of reservation is 10 secs.See the > documentation below . Normally , you should not need to specify this --> > <str name="commitReserveDuration">00:00:10</str> > </lst> > <!-- keep only 1 backup. Using this parameter precludes using the > "numberToKeep" request parameter. (Solr3.6 / Solr4.0)--> > <!-- (For this to work in conjunction with "backupAfter" with Solr > 3.6.0, see bug fix > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-3361&data=02%7C01%7CScott.Pouliot%40peoplefluent.com%7C8d43918dd95540a3a11708d55f523302%7C8b16fb62c78448b6aba889567990e7fe%7C1%7C1%7C636519729029923349&sdata=nTeuTHMD08zlSblNweCVRLWNnZcY1aQDPOlFsCile8Q%3D&reserved=0 > )--> > <str name="maxNumberOfBackups">1</str> > <!--<str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x. > xml,y.xml</str>--> > </requestHandler> > > And our slave config: > <requestHandler name="/replication" class="solr.ReplicationHandler" > > <lst name="slave"> > > <!--fully qualified url to the master core. It is possible to > pass on this as a request param for the fetchindex command--> > <str > name="masterUrl">http://server1:8080/solr/${https://na01.safelinks.pro > tection.outlook.com/?url=solr.core.name&data=02%7C01%7CScott.Pouliot%4 > 0peoplefluent.com%7C8d43918dd95540a3a11708d55f523302%7C8b16fb62c78448b > 6aba889567990e7fe%7C1%7C1%7C636519729029923349&sdata=Fes6G36gIMRyfahTI > fftg0eUEVEiVK77B8KpuTr%2FJrA%3D&reserved=0} > </str> > > <!--Interval in which the slave should poll master .Format is > HH:mm:ss . If this is absent slave does not poll automatically. > But a fetchindex can be triggered from the admin or the http > API > --> > <str name="pollInterval">00:00:45</str> > </lst> > </requestHandler> > > <requestHandler name="/dataimport" class="solr.DataImportHandler"> > <lst name="defaults"> > <str name="config">solr-data-config.xml</str> > </lst> > </requestHandler> >