Re: Setting up MiniSolrCloudCluster to use pre-built index

Mark Miller Wed, 24 Oct 2018 11:23:07 -0700

The merge can be really fast - it can just dump in the new segments and
rewrite the segments file basically.


I guess for you want, that's perhaps not the ideal route though. You could
maybe try and use collection aliases.

I thought about adding shard aliases way back, but never got to it.

On Tue, Oct 23, 2018 at 7:10 PM Ken Krugler <kkrugler_li...@transpac.com>
wrote:

> Hi Mark,
>
> I’ll have a completely new, rebuilt index that’s (a) large, and (b)
> already sharded appropriately.
>
> In that case, using the merge API isn’t great, in that it would take
> significant time and temporarily use double (or more) disk space.
>
> E.g. I’ve got an index with 250M+ records, and about 200GB. There are
> other indexes, still big but not quite as large as this one.
>
> So I’m still wondering if there’s any robust way to swap in a fresh set of
> shards, especially without relying on legacy cloud mode.
>
> I think I can figure out where the data is being stored for an existing
> (empty) collection, shut that down, swap in the new files, and reload.
>
> But I’m wondering if that’s really the best (or even sane) approach.
>
> Thanks,
>
> — Ken
>
> On May 19, 2018, at 6:24 PM, Mark Miller <markrmil...@gmail.com> wrote:
>
> You create MiniSolrCloudCluster with a base directory and then each Jetty
> instance created gets a SolrHome in a subfolder called node{i}. So if
> legacyCloud=true you can just preconfigure a core and index under the right
> node{i} subfolder. legacyCloud=true should not even exist anymore though,
> so the long term way to do this would be to create a collection and then
> use the merge API or something to merge your index into the empty
> collection.
>
> - Mark
>
> On Sat, May 19, 2018 at 5:25 PM Ken Krugler <kkrugler_li...@transpac.com>
> wrote:
>
> Hi all,
>
> Wondering if anyone has experience (this is with Solr 6.6) in setting up
> MiniSolrCloudCluster for unit testing, where we want to use an existing
> index.
>
> Note that this index wasn’t built with SolrCloud, as it’s generated by a
> distributed (Hadoop) workflow.
>
> So there’s no “restore from backup” option, or swapping collection
> aliases, etc.
>
> We can push our configset to Zookeeper and create the collection as per
> other unit tests in Solr, but what’s the right way to set up data dirs for
> the cores such that Solr is running with this existing index (or indexes,
> for our sharded test case)?
>
> Thanks!
>
> — Ken
>
> PS - yes, we’re aware of the routing issue with generating our own shards….
>
> --------------------------
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
>
> - Mark
> about.me/markrmiller
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

-- 
- Mark

http://about.me/markrmiller

Re: Setting up MiniSolrCloudCluster to use pre-built index

Reply via email to