I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)
 

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.





On 8/16/14, 3:03 AM, "Greg Solovyev" <g...@zimbra.com> wrote:

>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
>straight forward, but the main concern I have is the internal data format
>that ReplicationHandler and SnapPuller use. This new handler as well as
>the code that I've already written to download the index files from Solr
>will depend on that format. Unfortunately, this format is not documented
>and is not abstracted by SolrJ, so I wonder what I can do to make sure it
>does not change on us without notice.
>
>Thanks,
>Greg
>
>----- Original Message -----
>From: "Shawn Heisey" <s...@elyograg.org>
>To: solr-user@lucene.apache.org
>Sent: Friday, August 15, 2014 7:31:19 PM
>Subject: Re: How to restore an index from a backup over HTTP
>
>On 8/15/2014 5:51 AM, Greg Solovyev wrote:
>> What I want to achieve is being able to send the backed up index to
>>Solr (either standalone or with ZooKeeper) in a way similar to creating
>>a new Collection. I.e. create a new collection and upload an exiting
>>index directly into that Collection. I've looked through Solr code and
>>so far I have not found a handler that would allow this scenario. So,
>>the last idea is to implement a special handler for this case, perhaps
>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller
>>do pretty much what I need to do, except that the action has to be
>>initiated by the receiving Solr server and I need to initiate the action
>>externally. I.e., instead of having Solr slave download an index from
>>Solr master, I need to feed the index to Solr master and ideally this
>>would work the same way in standalone and SolrCloud modes.
>
>I have not made any attempt to verify what I'm stating below.  It may
>not work.
>
>What I think I would *try* is setting up a standalone Solr (no cloud) on
>the backup server.  Use scripted index/config copies and Solr start/stop
>actions to get the index up and running on a known core in the
>standalone Solr.  Then use the replication handler's HTTP API to
>replicate the index from that standalone server to each of the replicas
>in your cluster.
>
>https://wiki.apache.org/solr/SolrReplication#HTTP_API
>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe
>plication-HTTPAPICommandsfortheReplicationHandler
>
>One thing that I do not know is whether SolrCloud itself might interfere
>with these actions, or whether it might automatically take care of
>additional replicas if you replicate to the shard leader.  If SolrCloud
>*would* interfere, then this idea might need special support in
>SolrCloud, perhaps as an extension to the Collections API.  If it won't
>interfere, then the use-case would need to be documented (on the user
>wiki at a minimum) so that committers will be aware of it and preserve
>the capability in future versions.  An extension to the Collections API
>might be a good idea either way -- I've seen a number of questions about
>capability that falls under this basic heading.
>
>Thanks,
>Shawn

Reply via email to