Hi Mark,

I’ll have a completely new, rebuilt index that’s (a) large, and (b) already 
sharded appropriately.

In that case, using the merge API isn’t great, in that it would take 
significant time and temporarily use double (or more) disk space.

E.g. I’ve got an index with 250M+ records, and about 200GB. There are other 
indexes, still big but not quite as large as this one.

So I’m still wondering if there’s any robust way to swap in a fresh set of 
shards, especially without relying on legacy cloud mode.

I think I can figure out where the data is being stored for an existing (empty) 
collection, shut that down, swap in the new files, and reload.

But I’m wondering if that’s really the best (or even sane) approach.

Thanks,

— Ken

> On May 19, 2018, at 6:24 PM, Mark Miller <markrmil...@gmail.com> wrote:
> 
> You create MiniSolrCloudCluster with a base directory and then each Jetty
> instance created gets a SolrHome in a subfolder called node{i}. So if
> legacyCloud=true you can just preconfigure a core and index under the right
> node{i} subfolder. legacyCloud=true should not even exist anymore though,
> so the long term way to do this would be to create a collection and then
> use the merge API or something to merge your index into the empty
> collection.
> 
> - Mark
> 
> On Sat, May 19, 2018 at 5:25 PM Ken Krugler <kkrugler_li...@transpac.com>
> wrote:
> 
>> Hi all,
>> 
>> Wondering if anyone has experience (this is with Solr 6.6) in setting up
>> MiniSolrCloudCluster for unit testing, where we want to use an existing
>> index.
>> 
>> Note that this index wasn’t built with SolrCloud, as it’s generated by a
>> distributed (Hadoop) workflow.
>> 
>> So there’s no “restore from backup” option, or swapping collection
>> aliases, etc.
>> 
>> We can push our configset to Zookeeper and create the collection as per
>> other unit tests in Solr, but what’s the right way to set up data dirs for
>> the cores such that Solr is running with this existing index (or indexes,
>> for our sharded test case)?
>> 
>> Thanks!
>> 
>> — Ken
>> 
>> PS - yes, we’re aware of the routing issue with generating our own shards….
>> 
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378 <(530)%20210-6378>
>> http://www.scaleunlimited.com
>> Custom big data solutions & training
>> Flink, Solr, Hadoop, Cascading & Cassandra
>> 
>> --
> - Mark
> about.me/markrmiller

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to