We are in the midst of upgrading from Solr 3.6 to Solr 4.0 and have
encountered an issue with the method the SnapPuller now uses to determine
if a new directory is needed when fetching files to a slave from master.

With Solr 3.6, our reindexing process was:

On master:
1. Reindex in a separate process into a new directory:
<solr.data.dir>/<core>/index-<timestamp_of_reindex_start>
2. Update <solr.data.dir>/<core>/index.properties to 'index=<solr.data.dir>
/<core>/index-<timestamp_of_reindex_start>'
3. Reload <core> on master so that the new index referenced in
<solr.data.dir>/<core>/index.properties would be loaded.

Slaves would then fetch this new index into a new directory without any
manual intervention because the slave would determine that a full new index
copy was needed. SnapPuller in 3.6 used:

*boolean* isFullCopyNeeded = commit.getGeneration() >= latestGeneration;

Since the generation on master would be near zero on master after a reindex
and larger on a slave, the new index would be placed in a new directory on
the slave.

In Solr 4.0, beginning with svn 1235888 [1], the check is now:

*boolean* isFullCopyNeeded =
IndexDeletionPolicyWrapper.*getCommitTimestamp*(commit)
>= latestVersion || forceReplication;
As far as I can tell, forceReplication is only used in SolrCloud recovery
scenarios. Our new index on master has a newer  commitTimeMSec than the
slave index, so neither of these conditions is true. With
isFullCopyNeeded=false, our new index files get pulled into the existing
directory on the slave where they are then deleted. I think this is because
their generation is older than the current slave generation.

Would it still make sense to create a new directory on the slave if
master's generation is less than the slave's generation? I can't see a
scenario where you'd want a slave to fetch files from master with a smaller
generation into the current index directory of a slave.

If the commitTimeMSec based check in Solr 4.0 is needed for SolrCloud, what
other methods of swapping in an entirely new index would you recommend for
those not using SolrCloud?

[1] --
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SnapPuller.java?r1=1144761&r2=1235888&pathrev=1235888&diff_format=h

Thanks!

--Gregg


Gregg Donovan
Senior Software Engineer, Etsy.com
gr...@etsy.com

Reply via email to