Re: SOLR cloud disaster recovery
On Fri, Feb 28, 2014 at 7:50 PM, Per Steffensen st...@designware.dk wrote: I might be able to find something for you. Which version are you using - I have some scripts that work on 4.0 and some other scripts that work for 4.4 (and maybe later). This sounds useful. I am using 4.6.1. Kind regards Jan
SOLR cloud disaster recovery
Hi, I am a bit confused about how solr cloud disaster recovery is supposed to work exactly in the case of loosing a single node completely. Say I have a solr cloud cluster with 3 nodes. My collection is created with numShards=3replicationFactor=3maxShardsPerNode=3, so there is no data loss when I loose a node. However, how do configure a new node to take the place of the dead node? I bring up a new node (same hostname, ip, as the dead node) which is completely empty (empty data dir, empty solr.xml), install solr, and connect it to zookeeper. Is it supposed to work automatically from there? In my tests, the server has no cores and the solr-cloud graph overview simply shows all the shards/replicas on this node as down. Do I need to recreate the cores first? Note that these cores were initially created indirectly by creating the collection. Thanks, Jan
Re: SOLR cloud disaster recovery
Hi Jan, There are a few ways to do that, but no, nothing is automatic. 1) If your node is alive, you can create new replicas on the new node, let them replicate, verify they are ok, then delete the replicas on the old node and shut it down. 2) If your node is dead, create new replicas on the new node, let them replicate. You'll have to hand-edit clusterstate.json however, to fix the entries for the shards. 3) If you have a fully up-to-date backup of your dead node, just use the same hostname for your new node and restore the backups there. It should be fine. Just verify that the replicas for that node, as listed in clusterstate.json, are present and accounted for. HTH, Lajos On 28/02/2014 16:17, Jan Van Besien wrote: Hi, I am a bit confused about how solr cloud disaster recovery is supposed to work exactly in the case of loosing a single node completely. Say I have a solr cloud cluster with 3 nodes. My collection is created with numShards=3replicationFactor=3maxShardsPerNode=3, so there is no data loss when I loose a node. However, how do configure a new node to take the place of the dead node? I bring up a new node (same hostname, ip, as the dead node) which is completely empty (empty data dir, empty solr.xml), install solr, and connect it to zookeeper. Is it supposed to work automatically from there? In my tests, the server has no cores and the solr-cloud graph overview simply shows all the shards/replicas on this node as down. Do I need to recreate the cores first? Note that these cores were initially created indirectly by creating the collection. Thanks, Jan
Re: SOLR cloud disaster recovery
Hi Jan, There are a few ways to do that, but no, nothing is automatic. 1) If your node is alive, you can create new replicas on the new node, let them replicate, verify they are ok, then delete the replicas on the old node and shut it down. 2) If your node is dead, create new replicas on the new node, let them replicate. You'll have to hand-edit clusterstate.json however, to fix the entries for the shards. 3) If you have a fully up-to-date backup of your dead node, just use the same hostname for your new node and restore the backups there. It should be fine. Just verify that the replicas for that node, as listed in clusterstate.json, are present and accounted for. HTH, Lajos On 28/02/2014 16:17, Jan Van Besien wrote: Hi, I am a bit confused about how solr cloud disaster recovery is supposed to work exactly in the case of loosing a single node completely. Say I have a solr cloud cluster with 3 nodes. My collection is created with numShards=3replicationFactor=3maxShardsPerNode=3, so there is no data loss when I loose a node. However, how do configure a new node to take the place of the dead node? I bring up a new node (same hostname, ip, as the dead node) which is completely empty (empty data dir, empty solr.xml), install solr, and connect it to zookeeper. Is it supposed to work automatically from there? In my tests, the server has no cores and the solr-cloud graph overview simply shows all the shards/replicas on this node as down. Do I need to recreate the cores first? Note that these cores were initially created indirectly by creating the collection. Thanks, Jan
Re: SOLR cloud disaster recovery
We have created some scripts that can do this for you - basically reconstruct (by looking at information in ZK) solr.xml, core.properties etc on the new machine as they where on the machine that crashed. Our procedure when a machine crashes is * Remove it from rack, replace it by a similar machine with same hostname/IP * Run the scripts pointing out the IP of the machine that needs to have solr.xml and core.properties written * Start solr on this machine - it now run that same set of replica that the crashed machine did. Guess they will sync automatically with their sister-replica, but I do not know, because we do not use replication. I might be able to find something for you. Which version are you using - I have some scripts that work on 4.0 and some other scripts that work for 4.4 (and maybe later). Regards, Per Steffensen On 28/02/14 16:17, Jan Van Besien wrote: Hi, I am a bit confused about how solr cloud disaster recovery is supposed to work exactly in the case of loosing a single node completely. Say I have a solr cloud cluster with 3 nodes. My collection is created with numShards=3replicationFactor=3maxShardsPerNode=3, so there is no data loss when I loose a node. However, how do configure a new node to take the place of the dead node? I bring up a new node (same hostname, ip, as the dead node) which is completely empty (empty data dir, empty solr.xml), install solr, and connect it to zookeeper. Is it supposed to work automatically from there? In my tests, the server has no cores and the solr-cloud graph overview simply shows all the shards/replicas on this node as down. Do I need to recreate the cores first? Note that these cores were initially created indirectly by creating the collection. Thanks, Jan