So in theory, one could double a cluster by: 1) moving snapshots of each node to a new node. 2) for each snapshot moved, figure out the primary range of the new node by taking the old node's primary range token and calculating the midpoint value between that and the next primary range start token 3) the RFs should be preserved since the snapshot have a replicated set of data for the old primary range, the next primary has a RF already, and so does the n+1 primary range already
data distribution will be the same as the old primary range distirubtion. Then nodetool clean and repair would get rid of old data ranges not needed anymore. In practice, is this possible? I have heard Priam can double clusters and they do not use vnodes. I am assuming they do a similar approach but they only have to calculate single tokens? On Tue, Feb 20, 2018 at 11:21 AM, Carl Mueller <[email protected] > wrote: > As I understand it: Replicas of data are replicated to the next primary > range owner. > > As tokens are randomly generated (at least in 2.1.x that I am on), can't > we have this situation: > > Say we have RF3, but the tokens happen to line up where: > > NodeA handles 0-10 > NodeB handles 11-20 > NodeA handlea 21-30 > NodeB handles 31-40 > NodeC handles 40-50 > > The key aspect of that is that the random assignment of primary range > vnode tokens has resulted in NodeA and NodeB being the primaries for four > adjacent primary ranges. > > IF RF is replicated by going to the next adjacent nodes in the primary > range, and we are, say RF3, then B will have a replica of A, and then the > THIRD REPLICA IS BACK ON A. > > Is the RF distribution durable to this by ignoring the reappearance of A > and then cycling through until a unique node (NodeC) is encountered, and > then that becomes the third replica? > > > >
