Calculation in general is very simple - each node keeps
replication_factor/number_of_nodes part of data (number of replicas is spread
over all nodes). I.e. If you have 100 nodes and replication factor is three
each node keeps 0.03 of table size.
But you can go even with more simple approach - each node keeps more or less
the same amount of date. If you add new node to cluster it should get the same
data volume that is stored on any other node.
Best regards, Vladimir Yudovin,
Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.
---- On Wed, 12 Oct 2016 12:13:25 -0400Anubhav Kale
<anubhav.k...@microsoft.com> wrote ----
Suppose I have a 100 node ring, with num_tokens=32 (thus, 32 VNodes per
physical machine). Assume this cluster has just one keyspace having one table.
There are 10 SS Tables on each node, and size on disk is 10GB on each node. For
simplicity, assume each SSTable is 1GB.
Now, a node went down, and I need to rebuild it. Can you please explain to me
the math around how many SS Table files (and size) each node would stream to
this node ? How does that math change as #VNodes change ?
I am looking for rough calculations to understand this process better. I am
guessing I might have missed some variables in here (amount of data per token
range ?), so please let me know that too !
Thanks much !