Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rene Kochen
Hi all,

I want to add a data-center to an existing single data-center cluster.
First I have to make the existing cluster multi data-center compatible.

The existing cluster is a 12 node cluster with:
- Replication factor = 3
- Placement strategy = SimpleStrategy
- Endpoint snitch = SimpleSnitch

If I change the following:
- Placement strategy = NetworkTopologyStrategy
- Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong
to the same data-center and rack.

Do I have to run full repairs after this change? Because the yaml file
states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
PLACED.

Thanks!

Rene


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Mark Reddy
Yes, you must run a full repair for the reasons stated in the yaml file.


Mark


On Tue, Aug 5, 2014 at 11:52 AM, Rene Kochen rene.koc...@schange.com
wrote:

 Hi all,

 I want to add a data-center to an existing single data-center cluster.
 First I have to make the existing cluster multi data-center compatible.

 The existing cluster is a 12 node cluster with:
 - Replication factor = 3
 - Placement strategy = SimpleStrategy
 - Endpoint snitch = SimpleSnitch

 If I change the following:
 - Placement strategy = NetworkTopologyStrategy
 - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong
 to the same data-center and rack.

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.

 Thanks!

 Rene





Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rene Kochen
What I understand is that SimpleStrategy determines the endpoints for
replica's by traversing the ring clock-wise.

NetworkTopologyStrategy determines the replica's by traversing the ring
clock-wise and taking into account the racks and DC locations.

Since the file used by PropertyFileSnitch puts all endpoints in the same
data-center and rack, isn't the result of the endpoint selection basically
the same?

Thanks!

Rene


2014-08-05 12:56 GMT+02:00 Mark Reddy mark.re...@boxever.com:

 Yes, you must run a full repair for the reasons stated in the yaml file.


 Mark


 On Tue, Aug 5, 2014 at 11:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Hi all,

 I want to add a data-center to an existing single data-center cluster.
 First I have to make the existing cluster multi data-center compatible.

 The existing cluster is a 12 node cluster with:
 - Replication factor = 3
 - Placement strategy = SimpleStrategy
 - Endpoint snitch = SimpleSnitch

 If I change the following:
 - Placement strategy = NetworkTopologyStrategy
 - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file
 belong to the same data-center and rack.

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.

 Thanks!

 Rene






Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


As long as you correctly configure the new snitch so that the replica sets
do not change, no, you do not need to repair.

Barring that, if you manage to transform the replica set in such a way that
you always have one (fully repaired) replica from the old set, repair will
help. I do not recommend this very risky practice. In practice the only
transformation of snitch in a cluster with data which is likely to be safe
is one whose result is a NOOP in terms of replica placement.

In fact, the yaml file is stating something unreasonable there, because
repair cannot protect against this case :

- 6 node cluster, A B C D E F,  RF = 2

1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
2) Write row key X, value Y, to nodes A and B.
2) Change to OtherSnitch so that now C,D are responsible for row key X.
3) Repair and notice that neither C nor D answer Y when asked for row X.

=Rob


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rene Kochen
As long as you correctly configure the new snitch so that the replica sets
do not change, no, you do not need to repair.

Is the following correct:

The replica sets do not change if you modify the snitch from SimpleSnitch
to NetworkTopologyStrategy and the topology file puts all nodes in the same
data-center and rack.

Thanks again!

Rene


2014-08-05 20:05 GMT+02:00 Robert Coli rc...@eventbrite.com:

 On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


 As long as you correctly configure the new snitch so that the replica sets
 do not change, no, you do not need to repair.

 Barring that, if you manage to transform the replica set in such a way
 that you always have one (fully repaired) replica from the old set, repair
 will help. I do not recommend this very risky practice. In practice the
 only transformation of snitch in a cluster with data which is likely to be
 safe is one whose result is a NOOP in terms of replica placement.

 In fact, the yaml file is stating something unreasonable there, because
 repair cannot protect against this case :

 - 6 node cluster, A B C D E F,  RF = 2

 1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
 2) Write row key X, value Y, to nodes A and B.
 2) Change to OtherSnitch so that now C,D are responsible for row key X.
 3) Repair and notice that neither C nor D answer Y when asked for row X.

 =Rob




Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 2:27 PM, Rene Kochen rene.koc...@schange.com wrote:

 As long as you correctly configure the new snitch so that the replica
 sets do not change, no, you do not need to repair.

 Is the following correct:

 The replica sets do not change if you modify the snitch from SimpleSnitch
 to NetworkTopologyStrategy and the topology file puts all nodes in the same
 data-center and rack.


Yes, you can use nodetool getendpoints to illustrate this programatically.

1) make a set of keys with a key from each range
2) getendpoints for this set of keys
3) change snitch
4) getendpoints again

=Rob


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rameez Thonnakkal
I think the RAC placement of these 12 nodes will become important. As the
12 nodes are placed in SimpleSnitch, which is not RAC aware, it would be
good to retain them in single RAC in the property file snitch also
initially. node repair is a safe option. If you need to change the RAC
placement, my take would be to increase the Replication factor to atleast 3
and then distribute the nodes in different RAC.

This is not an expert opinion but a newbie thought.

Regards,
Rameez


On Tue, Aug 5, 2014 at 11:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


 As long as you correctly configure the new snitch so that the replica sets
 do not change, no, you do not need to repair.

 Barring that, if you manage to transform the replica set in such a way
 that you always have one (fully repaired) replica from the old set, repair
 will help. I do not recommend this very risky practice. In practice the
 only transformation of snitch in a cluster with data which is likely to be
 safe is one whose result is a NOOP in terms of replica placement.

 In fact, the yaml file is stating something unreasonable there, because
 repair cannot protect against this case :

 - 6 node cluster, A B C D E F,  RF = 2

 1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
 2) Write row key X, value Y, to nodes A and B.
 2) Change to OtherSnitch so that now C,D are responsible for row key X.
 3) Repair and notice that neither C nor D answer Y when asked for row X.

 =Rob