Re: indexing - offline
bq: So, a node is part of the cluster but no collections? How can we add a node to cloud without active participation? See the collections API create command, in particular the createNodeSet. You can specify exactly what Solr instances the collection is created on so you can have two collections using the same Zookeeper running on totally different nodes. If you use the "EMPTY" value here, you can ADDREPLICA to place replicas at the precise location you want. A slight variant of Tom's process is instead of deleting the collection every time, just delete all documents from the "old" collection once you've made the switch (delete by query on *:*). Either way works fine, whichever is more comfortable. The CREATEALIAS command is the one to switch your aliases back and forth, you use it both to create a new one and to change an existing one. Best, Erick On Thu, Oct 20, 2016 at 2:29 PM, Rallavagu wrote: > Thanks Evan for quick response. > > On 10/20/16 10:19 AM, Tom Evans wrote: >> >> On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu wrote: >>> >>> Solr 5.4.1 cloud with embedded jetty >>> >>> Looking for some ideas around offline indexing where an independent node >>> will be indexed offline (not in the cloud) and added to the cloud to >>> become >>> leader so other cloud nodes will get replicated. Wonder if this is >>> possible >>> without interrupting the live service. Thanks. >> >> >> How we do this, to reindex collection "foo": >> >> 1) First, collection "foo" should be an alias to the real collection, >> eg "foo_1" aliased to "foo" >> 2) Have a node "node_i" in the cluster that is used for indexing. It >> doesn't hold any shards of any collections > > So, a node is part of the cluster but no collections? How can we add a node > to cloud without active participation? > >> 3) Use collections API to create collection "foo_2", with however many >> shards required, but all placed on "node_i" >> 4) Index "foo_2" with new data with DIH or direct indexing to "node_1". >> 5) Use collections API to expand "foo_2" to all the nodes/replicas >> that it should be on > > Could you please point me to documentation on how to do this? I am referring > to this doc > https://cwiki.apache.org/confluence/display/solr/Collections+API. But, it > has many options and honestly not sure which one would be useful in this > case. > > Thanks > > >> 6) Remove "foo_2" from "node_i" >> 7) Verify contents of "foo_2" are correct >> 8) Use collections API to change alias for "foo" to "foo_2" >> 9) Remove "foo_1" collection once happy >> >> This avoids indexing overwhelming the performance of the cluster (or >> any nodes in the cluster that receive queries), and can be performed >> with zero downtime or config changes on the clients. >> >> Cheers >> >> Tom >> >
Re: indexing - offline
Thanks Evan for quick response. On 10/20/16 10:19 AM, Tom Evans wrote: On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu wrote: Solr 5.4.1 cloud with embedded jetty Looking for some ideas around offline indexing where an independent node will be indexed offline (not in the cloud) and added to the cloud to become leader so other cloud nodes will get replicated. Wonder if this is possible without interrupting the live service. Thanks. How we do this, to reindex collection "foo": 1) First, collection "foo" should be an alias to the real collection, eg "foo_1" aliased to "foo" 2) Have a node "node_i" in the cluster that is used for indexing. It doesn't hold any shards of any collections So, a node is part of the cluster but no collections? How can we add a node to cloud without active participation? 3) Use collections API to create collection "foo_2", with however many shards required, but all placed on "node_i" 4) Index "foo_2" with new data with DIH or direct indexing to "node_1". 5) Use collections API to expand "foo_2" to all the nodes/replicas that it should be on Could you please point me to documentation on how to do this? I am referring to this doc https://cwiki.apache.org/confluence/display/solr/Collections+API. But, it has many options and honestly not sure which one would be useful in this case. Thanks 6) Remove "foo_2" from "node_i" 7) Verify contents of "foo_2" are correct 8) Use collections API to change alias for "foo" to "foo_2" 9) Remove "foo_1" collection once happy This avoids indexing overwhelming the performance of the cluster (or any nodes in the cluster that receive queries), and can be performed with zero downtime or config changes on the clients. Cheers Tom
Re: indexing - offline
On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu wrote: > Solr 5.4.1 cloud with embedded jetty > > Looking for some ideas around offline indexing where an independent node > will be indexed offline (not in the cloud) and added to the cloud to become > leader so other cloud nodes will get replicated. Wonder if this is possible > without interrupting the live service. Thanks. How we do this, to reindex collection "foo": 1) First, collection "foo" should be an alias to the real collection, eg "foo_1" aliased to "foo" 2) Have a node "node_i" in the cluster that is used for indexing. It doesn't hold any shards of any collections 3) Use collections API to create collection "foo_2", with however many shards required, but all placed on "node_i" 4) Index "foo_2" with new data with DIH or direct indexing to "node_1". 5) Use collections API to expand "foo_2" to all the nodes/replicas that it should be on 6) Remove "foo_2" from "node_i" 7) Verify contents of "foo_2" are correct 8) Use collections API to change alias for "foo" to "foo_2" 9) Remove "foo_1" collection once happy This avoids indexing overwhelming the performance of the cluster (or any nodes in the cluster that receive queries), and can be performed with zero downtime or config changes on the clients. Cheers Tom
indexing - offline
Solr 5.4.1 cloud with embedded jetty Looking for some ideas around offline indexing where an independent node will be indexed offline (not in the cloud) and added to the cloud to become leader so other cloud nodes will get replicated. Wonder if this is possible without interrupting the live service. Thanks.