Re: Copying a SolrCloud collection to other hosts

Patrick Schemitz Thu, 15 Mar 2018 01:54:39 -0700

Hi Erick,

thanks a lot, that solved our problem nicely.


(It took us a try or two to notice that this will not copy the entire
collection but only the shard on the source instance, and we need to do
this for all instances explicitly. But hey, we had to do the same for
the old approch of scp'ing the data directories.)

Ciao, Patrick

On Tue, Mar 06, 2018 at 07:18:15AM -0800, Erick Erickson wrote:
> this is part of the "different replica types" capability, there are
> NRT (the only type available prior to 7x), PULL and TLOG which would
> have different names. I don't know of any way to switch it off.
> 
> As far as moving the data, here's a little known trick: Use the
> replication API to issue a fetchindexk, see:
> https://lucene.apache.org/solr/guide/6_6/index-replication.html As
> long as the target cluster can "see" the source cluster via http, this
> should work. This is entirely outside SolrCloud and ZooKeeper is not
> involved. This would even work with, say, one side being stand-alone
> and the other being SolrCloud (not that you want to do that, just
> illustrating it's not part of SolrCloud)...
> 
> So you'd specify something like:
> http://target_node:port/solr/core_name/replication?command=fetchindex&masterUrl=http://source_node:port/solr/core_name
> 
> "core_name" in these cases is what appears in the "cores" dropdown on
> the admin UI page. You do not have to shut Solr down at all on either
> end to use this, although last I knew the target node would not serve
> queries while this was happening.
> 
> An alternative is to not hard-code the names in your copy script,
> rather look at the information in ZooKeeper for your source and target
> information, you could do this by using the CLUSTERSTATUS collections
> API call.
> 
> Best,
> Erick
> 
> On Tue, Mar 6, 2018 at 6:47 AM, Patrick Schemitz <p...@solute.de> wrote:
> > Hi List,
> >
> > so I'm running a bunch of SolrCloud clusters (each cluster is: 8 shards
> > on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard per
> > instance).
> >
> > Building the index afresh takes 15+ hours, so when I have to deploy a new
> > index, I build it once, on one cluster, and then copy (scp) over the
> > data/<main_index>/index directories (shutting down the Solr instances 
> > first).
> >
> > I could get Solr 6.5.1 to number the shard/replica directories nicely via
> > the createNodeSet and createNodeSet.shuffle options:
> >
> > Solr 6.5.1 /var/lib/solr:
> >
> > Server node 1:
> > instance00/data/main_index_shard1_replica1
> > instance01/data/main_index_shard2_replica1
> > instance02/data/main_index_shard3_replica1
> > instance03/data/main_index_shard4_replica1
> >
> > Server node 2:
> > instance00/data/main_index_shard5_replica1
> > instance01/data/main_index_shard6_replica1
> > instance02/data/main_index_shard7_replica1
> > instance03/data/main_index_shard8_replica1
> >
> > However, while attempting to upgrade to 7.2.1, this numbering has changed:
> >
> > Solr 7.2.1 /var/lib/solr:
> >
> > Server node 1:
> > instance00/data/main_index_shard1_replica_n1
> > instance01/data/main_index_shard2_replica_n2
> > instance02/data/main_index_shard3_replica_n4
> > instance03/data/main_index_shard4_replica_n6
> >
> > Server node 2:
> > instance00/data/main_index_shard5_replica_n8
> > instance01/data/main_index_shard6_replica_n10
> > instance02/data/main_index_shard7_replica_n12
> > instance03/data/main_index_shard8_replica_n14
> >
> > This new numbering breaks my copy script, and furthermode, I'm worried
> > as to what happens when the numbering is different among target clusters.
> >
> > How can I switch this back to the old numbering scheme?
> >
> > Side note: is there a recommended way of doing this? Is the
> > backup/restore mechanism suitable for this? The ref guide is kind of terse
> > here.
> >
> > Thanks in advance,
> >
> > Ciao, Patrick

Re: Copying a SolrCloud collection to other hosts

Reply via email to