Awesome, thank you so much! I completely missed the part "the token range that it hits will be split", now everything makes sense!
Again, thanks a lot for your help! Luca On Wed, Jun 15, 2022 at 1:04 AM Hannu Kröger <hkro...@gmail.com> wrote: > Adding a token (which in essence is a vnode) means that the token range > that it hits will be split into two. And that data range which has a new > owner will be replicated to the new owner node. If there are a lot of > tokens (=vnodes) in the cluster, adding some amount of vnodes (e.g. > num_tokens=16) is going to affect that amount (e.g. 16) of existing ranges > but if there are a lot of tokens, each range is relatively small and > distributed across the cluster. > > > A very naive example: > Cluster has 100 nodes and 100GB data with replication factor=3 => 300GB > data altogether. Each node will have ~3GB data. num_tokens is let’s say > 256. In the cluster there would be 256*100 => 25600 tokens altogether. > You add one more node and let’s imagine that tokens are perfectly > distributed, in the future each node will contain 2.97GB of data. > > When that new node is joining, those 256 tokens are (hopefully) > distributed evenly and each of those 100 nodes will replicate ~0.03GB of > data to that new node so that it will eventually have that 2.97GB of data. > And the cluster would have 25856 tokens after the scaling out operation. > And only 256 existing token ranges would be changed, not all 25600 when a > new node is joining. > > So you see that for each node it’s only 30mb to replicate to the new node. > Not very expensive, right? > > In real life, it’s not so precise and all but the basic idea is the same. > > Cheers, > Hannu > > On 15. Jun 2022, at 10.32, Luca Rondanini <luca.rondan...@gmail.com> > wrote: > > Thanks a lot Hannu, > > really helpful! But isn't that crazy expensive? adding a vnode means that > every vnode in the cluster will have a different range of tokens which > means a lot of data will need to be moved around. > > Thanks again, > Luca > > > > On Wed, Jun 15, 2022 at 12:25 AM Hannu Kröger <hkro...@gmail.com> wrote: > >> When a node joins a cluster, it gets (semi-)random tokens based on >> num_tokens value. >> >> Total amount of vnodes is not fixed. I don’t remember top of my hat if >> num_tokens can be different on each node but whenever you add a node, new >> vnodes get “created”. Existing token ranges will be split and some range >> will be allocated for the new node and data is being replicated to the >> joining node. So if you have num_tokens set to a higher value like 16 or >> so, adding and removing a single node in a cluster is standard operation >> and although it causes some load on the cluster, it should be somewhat >> evenly distributed among other nodes. If you have just a single token per >> node then scaling up or down has a bit different effects due to balancing >> issues etc. So there is a reason why default num_tokens is 16 currently. >> >> Cheers, >> Hannu >> >> On 15. Jun 2022, at 10.12, Luca Rondanini <luca.rondan...@gmail.com> >> wrote: >> >> ok, that makes sense, but does the partitioner add vnodes? is the number >> of vnodes fixed in a cluster? >> >> On Wed, Jun 15, 2022 at 12:10 AM Hannu Kröger <hkro...@gmail.com> wrote: >> >>> Hey, >>> >>> num_tokens is tokens per node. >>> >>> So in your case you would have 15 vnodes altogether. >>> >>> Cheers, >>> Hannu >>> >>> > On 15. Jun 2022, at 10.08, Luca Rondanini <luca.rondan...@gmail.com> >>> wrote: >>> > >>> > Hi all, >>> > >>> > I'm just trying to understand better how cassandra works. >>> > >>> > My understanding is that, once set, the number of vnodes does not >>> change in a cluster. The partitioner allocates vnodes to nodes ensuring >>> replication data are not stored on the same node. >>> > >>> > But what happens if there are more nodes than vnodes? If I set >>> num_tokens to 3 and I have 5 servers? Unless the partitioner adds vnodes >>> and moves data around but it seems an extremely expensive operation. I'm >>> sure I'm missing something, I'm not quite sure what! :) >>> > >>> > Thanks, >>> > Luca >>> > >>> >>> >> >