Re: SolrCloud: Adding a very large collection to a pre-existing cluster

Jeff Wartes Tue, 21 Jun 2016 16:28:03 -0700

There’s no official way of doing #1, but there are some less official ways:
1. The Backup/Restore API provides some hooks into loading pre-existing data 
dirs into an existing collection. Lots of caveats.
2. If you don’t have many shards, there’s always rsync/reload.
3. There are some third-party tools that help with this kind of thing:
a. https://github.com/whitepages/solrcloud_manager (primarily a command line 
tool)
b. https://github.com/bloomreach/solrcloud-haft (primarily a library)

For #2, absolutely. Spin up some new nodes in your cluster, and then use the 
“createNodeSet” parameter when creating the new collection to restrict to those 
new nodes:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

On 6/21/16, 12:33 PM, "Kelly, Frank" <frank.ke...@here.com> wrote:

>We have about 200 million documents (~70 GB) we need to keep indexed across 3 
>collections.
>
>Currently 2 of the 3 collections are already indexed (roughly 90m docs).
>
>We'd like to create the remaining collection (about 100 m documents) but 
>minimizing the performance impact on the existing collections on Solr servers 
>during that Time.
>
>Is there some way to do this either by
>
>  1.  Creating the collection in another environment and shipping the 
> (underlying Lucene) index files
>  2.  Creating the collection on (dedicated) new machines that we add to the 
> SolrCloud cluster?
>
>Thoughts, comments or suggestions appreciated,
>
>Best
>
>-Frank Kelly
>

Re: SolrCloud: Adding a very large collection to a pre-existing cluster

Reply via email to