RackAwareStrategy - add the third datacenter to live cluster with replication factor 3

2010-02-11 Thread Weijun Li
Hello,

I have a testing cluster with: A (dc1), B (dc1), C(dc2), D(dc2). The
replication factor is 2 so I assume each DC will have a complete copy of the
data. Also I'm using PropertyFileEndPointSnitch with rack.properties for the
dc and rack settings.

So, what's the steps to add another datacenter and increase replication
factor to 3 to ensure that dc3 will also get a complete copy of the data?
Meaning each of these 3 dc will have a complete copy of the data and they
keep synchronize with each other with new changes. What I'm guessing is:

1) Increase replication factor of A/B/C/D to 3, modify their rack.properties
to include E(dc3) and F(dc3) then restart them one by one. At this point E
and F haven't been started yet.
2) Bootstrap E and F (both from dc3) to join the cluster.

in this case, will cassandra automatically put the 3rd replica to E and F?

Thanks,
-Weijun

 P.S here what the cassandra document says about dc replication but I'm not
sure what will happen when you join nodes from the 3rd dc.


   -

   RackAwareStrategy: replica 2 is is placed in the first node along the
   ring the belongs in *another* data center than the first; the remaining
   N-2 replicas, if any, are placed on the first nodes along the ring in the
   *same* rack as the first


Re: RackAwareStrategy - add the third datacenter to live cluster with replication factor 3

2010-02-11 Thread Jonathan Ellis
On Thu, Feb 11, 2010 at 1:53 PM, Weijun Li weiju...@gmail.com wrote:
 Hello,

 I have a testing cluster with: A (dc1), B (dc1), C(dc2), D(dc2). The
 replication factor is 2 so I assume each DC will have a complete copy of the
 data. Also I'm using PropertyFileEndPointSnitch with rack.properties for the
 dc and rack settings.

 So, what's the steps to add another datacenter and increase replication
 factor to 3 to ensure that dc3 will also get a complete copy of the data?

RackAwareStrategy only cares about replicating to 2 datacenters out of Y.  So

(1) Write a ReplicationStrategy that extends RAS's algorithm to
replicating to Y of Y DCs
   (1a) alternatively, make RAS configurable to be X of Y where X
could be 2 (current) or Y (your case) or anything in between
(2) deploy the new RS, bring up nodes in your 3rd DC, and repair them.

-Jonathan