There’s no official way of doing #1, but there are some less official ways: 1. The Backup/Restore API provides some hooks into loading pre-existing data dirs into an existing collection. Lots of caveats. 2. If you don’t have many shards, there’s always rsync/reload. 3. There are some third-party tools that help with this kind of thing: a. https://github.com/whitepages/solrcloud_manager (primarily a command line tool) b. https://github.com/bloomreach/solrcloud-haft (primarily a library)
For #2, absolutely. Spin up some new nodes in your cluster, and then use the “createNodeSet” parameter when creating the new collection to restrict to those new nodes: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1 On 6/21/16, 12:33 PM, "Kelly, Frank" <frank.ke...@here.com> wrote: >We have about 200 million documents (~70 GB) we need to keep indexed across 3 >collections. > >Currently 2 of the 3 collections are already indexed (roughly 90m docs). > >We'd like to create the remaining collection (about 100 m documents) but >minimizing the performance impact on the existing collections on Solr servers >during that Time. > >Is there some way to do this either by > > 1. Creating the collection in another environment and shipping the > (underlying Lucene) index files > 2. Creating the collection on (dedicated) new machines that we add to the > SolrCloud cluster? > >Thoughts, comments or suggestions appreciated, > >Best > >-Frank Kelly >