The merge can be really fast - it can just dump in the new segments and rewrite the segments file basically.
I guess for you want, that's perhaps not the ideal route though. You could maybe try and use collection aliases. I thought about adding shard aliases way back, but never got to it. On Tue, Oct 23, 2018 at 7:10 PM Ken Krugler <kkrugler_li...@transpac.com> wrote: > Hi Mark, > > I’ll have a completely new, rebuilt index that’s (a) large, and (b) > already sharded appropriately. > > In that case, using the merge API isn’t great, in that it would take > significant time and temporarily use double (or more) disk space. > > E.g. I’ve got an index with 250M+ records, and about 200GB. There are > other indexes, still big but not quite as large as this one. > > So I’m still wondering if there’s any robust way to swap in a fresh set of > shards, especially without relying on legacy cloud mode. > > I think I can figure out where the data is being stored for an existing > (empty) collection, shut that down, swap in the new files, and reload. > > But I’m wondering if that’s really the best (or even sane) approach. > > Thanks, > > — Ken > > On May 19, 2018, at 6:24 PM, Mark Miller <markrmil...@gmail.com> wrote: > > You create MiniSolrCloudCluster with a base directory and then each Jetty > instance created gets a SolrHome in a subfolder called node{i}. So if > legacyCloud=true you can just preconfigure a core and index under the right > node{i} subfolder. legacyCloud=true should not even exist anymore though, > so the long term way to do this would be to create a collection and then > use the merge API or something to merge your index into the empty > collection. > > - Mark > > On Sat, May 19, 2018 at 5:25 PM Ken Krugler <kkrugler_li...@transpac.com> > wrote: > > Hi all, > > Wondering if anyone has experience (this is with Solr 6.6) in setting up > MiniSolrCloudCluster for unit testing, where we want to use an existing > index. > > Note that this index wasn’t built with SolrCloud, as it’s generated by a > distributed (Hadoop) workflow. > > So there’s no “restore from backup” option, or swapping collection > aliases, etc. > > We can push our configset to Zookeeper and create the collection as per > other unit tests in Solr, but what’s the right way to set up data dirs for > the cores such that Solr is running with this existing index (or indexes, > for our sharded test case)? > > Thanks! > > — Ken > > PS - yes, we’re aware of the routing issue with generating our own shards…. > > -------------------------- > Ken Krugler > +1 530-210-6378 <(530)%20210-6378> > http://www.scaleunlimited.com > Custom big data solutions & training > Flink, Solr, Hadoop, Cascading & Cassandra > > -- > > - Mark > about.me/markrmiller > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > Custom big data solutions & training > Flink, Solr, Hadoop, Cascading & Cassandra > > -- - Mark http://about.me/markrmiller