So we're running Solr in a Master/Slave configuration (1 of each) and it seems
that the replication stalls or stops functioning every now and again. If we
restart the Solr service or optimize the core it seems to kick back in again.
Anyone have any idea what might be causing this? We do have a good amount of
cores on each server (@150 or so), but I have heard reports of a LOT more than
that in use.
Here is our master config:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value
for replicateAfter. -->
<str name="replicateAfter">startup</str>
<str name="replicateAfter">commit</str>
<!--The default value of reservation is 10 secs.See the documentation
below . Normally , you should not need to specify this -->
<str name="commitReserveDuration">00:00:10</str>
</lst>
<!-- keep only 1 backup. Using this parameter precludes using the
"numberToKeep" request parameter. (Solr3.6 / Solr4.0)-->
<!-- (For this to work in conjunction with "backupAfter" with Solr 3.6.0,
see bug fix https://issues.apache.org/jira/browse/SOLR-3361 )-->
<str name="maxNumberOfBackups">1</str>
<!--<str
name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>-->
</requestHandler>
And our slave config:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="slave">
<!--fully qualified url to the master core. It is possible to pass on
this as a request param for the fetchindex command-->
<str name="masterUrl">http://server1:8080/solr/${solr.core.name}</str>
<!--Interval in which the slave should poll master .Format is HH:mm:ss .
If this is absent slave does not poll automatically.
But a fetchindex can be triggered from the admin or the http API -->
<str name="pollInterval">00:00:45</str>
</lst>
</requestHandler>
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">solr-data-config.xml</str>
</lst>
</requestHandler>