ReplicationHandler reports incorrect replication failures ---------------------------------------------------------
Key: SOLR-1853 URL: https://issues.apache.org/jira/browse/SOLR-1853 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: Linux Reporter: Shawn Smith The ReplicationHandler "details" command reports that replication failed when it didn't. This occurs after a slave is restarted when it is already in sync with the master. This makes it difficult to write production monitors that check the health of master-slave replication (no network issues, unexpected slowdowns, etc). >From the code, it looks like "SnapPuller.successfulInstall" starts out false >on restart. If the slave starts out in sync with the master, then each no-op >replication poll leaves "successfulInstall" set to false which makes >SnapPuller.logReplicationTimeAndConfFiles log the poll as a failure. >SnapPuller.successfulInstall stays false until the first time replication >actually has to do something, at which point it gets set to true, and then >everything is OK. h4. Steps to reproduce # Setup Solr master and slave servers using Solr 1.4 Java replication. # Index some content on the master. Wait for it to replicate through to the slave so the master and slave are in sync. # Stop the slave server. # Restart the slave server. # Wait for the first slave replication poll. # Query the replication status using "http://localhost:8983/solr/replication?command=details" # Until the master index changes and there's something to replicate, all slave replication polls after the restart will be shown as failed in the XML response. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.