Re: SOLR cloud disaster recovery

2014-03-03 Thread Jan Van Besien
On Fri, Feb 28, 2014 at 7:50 PM, Per Steffensen st...@designware.dk wrote:
 I might be able to find something for you. Which version are you using - I
 have some scripts that work on 4.0 and some other scripts that work for 4.4
 (and maybe later).

This sounds useful. I am using 4.6.1.

Kind regards
Jan


SOLR cloud disaster recovery

2014-02-28 Thread Jan Van Besien
Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3replicationFactor=3maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan


Re: SOLR cloud disaster recovery

2014-02-28 Thread Lajos

Hi Jan,

There are a few ways to do that, but no, nothing is automatic.

1) If your node is alive, you can create new replicas on the new node, 
let them replicate, verify they are ok, then delete the replicas on the 
old node and shut it down.


2) If your node is dead, create new replicas on the new node, let them 
replicate. You'll have to hand-edit clusterstate.json however, to fix 
the entries for the shards.


3) If you have a fully up-to-date backup of your dead node, just use the 
same hostname for your new node and restore the backups there. It should 
be fine. Just verify that the replicas for that node, as listed in 
clusterstate.json, are present and accounted for.


HTH,

Lajos


On 28/02/2014 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3replicationFactor=3maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan



Re: SOLR cloud disaster recovery

2014-02-28 Thread Lajos

Hi Jan,

There are a few ways to do that, but no, nothing is automatic.

1) If your node is alive, you can create new replicas on the new node, 
let them replicate, verify they are ok, then delete the replicas on the 
old node and shut it down.


2) If your node is dead, create new replicas on the new node, let them 
replicate. You'll have to hand-edit clusterstate.json however, to fix 
the entries for the shards.


3) If you have a fully up-to-date backup of your dead node, just use the 
same hostname for your new node and restore the backups there. It should 
be fine. Just verify that the replicas for that node, as listed in 
clusterstate.json, are present and accounted for.


HTH,

Lajos


On 28/02/2014 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3replicationFactor=3maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan



Re: SOLR cloud disaster recovery

2014-02-28 Thread Per Steffensen
We have created some scripts that can do this for you - basically 
reconstruct (by looking at information in ZK) solr.xml, core.properties 
etc on the new machine as they where on the machine that crashed. Our 
procedure when a machine crashes is

* Remove it from rack, replace it by a similar machine with same hostname/IP
* Run the scripts pointing out the IP of the machine that needs to have 
solr.xml and core.properties written
* Start solr on this machine - it now run that same set of replica that 
the crashed machine did. Guess they will sync automatically with their 
sister-replica, but I do not know, because we do not use replication.


I might be able to find something for you. Which version are you using - 
I have some scripts that work on 4.0 and some other scripts that work 
for 4.4 (and maybe later).


Regards, Per Steffensen

On 28/02/14 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3replicationFactor=3maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan