Hi,
Calculation in general is very simple - each node keeps replication_factor/number_of_nodes part of data (number of replicas is spread over all nodes). I.e. If you have 100 nodes and replication factor is three each node keeps 0.03 of table size. But you can go even with more simple approach - each node keeps more or less the same amount of date. If you add new node to cluster it should get the same data volume that is stored on any other node. Best regards, Vladimir Yudovin, Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer. Launch your cluster in minutes. ---- On Wed, 12 Oct 2016 12:13:25 -0400Anubhav Kale <anubhav.k...@microsoft.com> wrote ---- Hello, Suppose I have a 100 node ring, with num_tokens=32 (thus, 32 VNodes per physical machine). Assume this cluster has just one keyspace having one table. There are 10 SS Tables on each node, and size on disk is 10GB on each node. For simplicity, assume each SSTable is 1GB. Now, a node went down, and I need to rebuild it. Can you please explain to me the math around how many SS Table files (and size) each node would stream to this node ? How does that math change as #VNodes change ? I am looking for rough calculations to understand this process better. I am guessing I might have missed some variables in here (amount of data per token range ?), so please let me know that too ! Thanks much !