Re: Generating evenly distributed tokens for vnodes

Jon Haddad Fri, 29 May 2020 14:24:50 -0700

I'm on mobile now so I might be mistaken, but I don't think nodetool move
works with multiple tokens


On Fri, May 29, 2020, 1:48 PM Kornel Pal <kornel...@gmail.com> wrote:

> Hi Anthony,
>
> Thank you very much for looking into using the script for initial token
> generation and for providing multiple detailed methods of expanding the
> cluster.
>
> This helps a lot, indeed.
>
> Regards,
> Kornel
> Anthony Grasso wrote:
>
> Hi Kornel,
>
> Great use of the script for generating initial tokens! I agree that you
> can achieve an optimal token distribution in a cluster using such a method.
>
> One thing to think about is the process for expanding the size of the
> cluster in this case. For example consider the scenario where you wanted to
> insert a single new node into the cluster. To do this you would need to
> calculate what the new token ranges should be for the nodes including the
> new node. You would then need to reassign existing tokens to other nodes
> using 'nodetool move'. You would likely need to call this command a few
> times to do a few movements in order to achieve the newly calculated token
> assignments. Once the "gap" in the token ranges has been created, you would
> then update the initial_token property for the existing nodes in the
> cluster. Finally, you could then insert the new node with the assigned
> tokens.
>
> While the above process could be used to maintain an optimal token
> distribution in a cluster, it does increase operational overhead. This is
> where allocate_tokens_for_keyspace and
> allocate_tokens_for_local_replication_factor (4.0 only) play a critical
> role. They save the operational overhead when changing the size of the
> cluster. In addition, from my experience they do a pretty good job at
> keeping the token ranges evenly distributed when expanding the cluster.
> Even in the case where a low number for num_tokens is used. If expanding
> the cluster size is required during an emergency, using the
> allocate_token_* setting would be the most simple and reliable way to
> quickly insert a node while maintaining reasonable token distribution.
>
> The only other way to expand the cluster and maintain even token
> distribution without using an allocate_token_* setting, is to double the
> size of the cluster each time. Obviously this has its own draw backs in
> terms of increase costs to both money and time compared to inserting a
> single node.
>
> Hope this helps.
>
> Kind regards,
> Anthony
>
> On Thu, 28 May 2020 at 04:52, Kornel Pal <kornel...@gmail.com> wrote:
>
>> As I understand, the previous discussion is about using
>> allocate_tokens_for_keyspace for allocating tokens for most of the
>> nodes. On the other hand, I am proposing to generate all the tokens for
>> all the nodes using a Python script.
>>
>> This seems to result in perfectly even token ownership distribution
>> across all the nodes for all possible replication factors, thus being an
>> improvement over using allocate_tokens_for_keyspace.
>>
>> Elliott Sims wrote:
>> > There's also a slightly older mailing list discussion on this subject
>> > that goes into detail on this sort of strategy:
>> > https://www.mail-archive.com/user@cassandra.apache.org/msg60006.html
>> >
>> > I've been approximately following it, repeating steps 3-6 for the first
>> > host in each "rack(replica, since I have 3 racks and RF=3) then 8-10
>> for
>> > the remaining hosts in the new datacenter.  So far, so good (sample
>> size
>> > of 1) but it's a pretty painstaking process
>> >
>> > This should get a lot simpler with Cassandra 4+'s
>> > "allocate_tokens_for_local_replication_factor" option, which will
>> > default to 3.
>> >
>> > On Wed, May 27, 2020 at 4:34 AM Kornel Pal <kornel...@gmail.com
>> > <mailto:kornel...@gmail.com>> wrote:
>> >
>> >     Hi,
>> >
>> >     Generating ideal tokens for single-token datacenters is well
>> understood
>> >     and documented, but there is much less information available on
>> >     generating tokens with even ownership distribution when using
>> vnodes.
>> >     The best description I could find on token generation for vnodes is
>> >
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>> >
>> >     While allocate_tokens_for_keyspace results in much more even
>> ownership
>> >     distribution than random allocation, and does a great job at
>> balancing
>> >     ownership when adding new nodes, using it for creating a new
>> datacenter
>> >     results in less than ideal ownership distribution.
>> >
>> >     After some experimentation, I found that it is possible to generate
>> all
>> >     the tokens for a new datacenter with an extended version of the
>> Python
>> >     script presented in the above blog post. Using these tokens seem to
>> >     result in perfectly even ownership distribution with various
>> >     token/node/rack configurations for all possible replication factors.
>> >
>> >     Murmur3Partitioner:
>> >       >>> datacenter_offset = 0
>> >       >>> num_tokens = 4
>> >       >>> num_racks = 3
>> >       >>> num_nodes = 3
>> >       >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
>> >     {}'.format(r
>> >     + 1, n + 1, ','.join([str(((2**64 / (num_tokens * num_nodes *
>> >     num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) -
>> >     2**63 +
>> >     datacenter_offset) for t in range(num_tokens)])) for r in
>> >     range(num_racks) for n in range(num_nodes)])
>> >     [Rack #1, Node #1] initial_token:
>> >     -9223372036854775808,-4611686018427387908,-8,4611686018427387892
>> >     [Rack #1, Node #2] initial_token:
>> >
>>  
>> -7686143364045646508,-3074457345618258608,1537228672809129292,6148914691236517192
>> >     [Rack #1, Node #3] initial_token:
>> >
>>  
>> -6148914691236517208,-1537228672809129308,3074457345618258592,7686143364045646492
>> >     [Rack #2, Node #1] initial_token:
>> >
>>  
>> -8710962479251732708,-4099276460824344808,512409557603043092,5124095576030430992
>> >     [Rack #2, Node #2] initial_token:
>> >
>>  
>> -7173733806442603408,-2562047788015215508,2049638230412172392,6661324248839560292
>> >     [Rack #2, Node #3] initial_token:
>> >
>>  
>> -5636505133633474108,-1024819115206086208,3586866903221301692,8198552921648689592
>> >     [Rack #3, Node #1] initial_token:
>> >
>>  
>> -8198552921648689608,-3586866903221301708,1024819115206086192,5636505133633474092
>> >     [Rack #3, Node #2] initial_token:
>> >
>>  
>> -6661324248839560308,-2049638230412172408,2562047788015215492,7173733806442603392
>> >     [Rack #3, Node #3] initial_token:
>> >
>>  
>> -5124095576030431008,-512409557603043108,4099276460824344792,8710962479251732692
>> >
>> >     RandomPartitioner:
>> >       >>> datacenter_offset = 0
>> >       >>> num_tokens = 4
>> >       >>> num_racks = 3
>> >       >>> num_nodes = 3
>> >       >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
>> >     {}'.format(r
>> >     + 1, n + 1, ','.join([str(((2**127 / (num_tokens * num_nodes *
>> >     num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) +
>> >     datacenter_offset) for t in range(num_tokens)])) for r in
>> >     range(num_racks) for n in range(num_nodes)])
>> >     [Rack #1, Node #1] initial_token:
>> >
>>  
>> 0,42535295865117307932921825928971026427,85070591730234615865843651857942052854,127605887595351923798765477786913079281
>> >     [Rack #1, Node #2] initial_token:
>> >
>>  
>> 14178431955039102644307275309657008809,56713727820156410577229101238628035236,99249023685273718510150927167599061663,141784319550391026443072753096570088090
>> >     [Rack #1, Node #3] initial_token:
>> >
>>  
>> 28356863910078205288614550619314017618,70892159775195513221536376548285044045,113427455640312821154458202477256070472,155962751505430129087380028406227096899
>> >     [Rack #2, Node #1] initial_token:
>> >
>>  
>> 4726143985013034214769091769885669603,47261439850130342147690917698856696030,89796735715247650080612743627827722457,132332031580364958013534569556798748884
>> >     [Rack #2, Node #2] initial_token:
>> >
>>  
>> 18904575940052136859076367079542678412,61439871805169444791998193008513704839,103975167670286752724920018937484731266,146510463535404060657841844866455757693
>> >     [Rack #2, Node #3] initial_token:
>> >
>>  
>> 33083007895091239503383642389199687221,75618303760208547436305468318170713648,118153599625325855369227294247141740075,160688895490443163302149120176112766502
>> >     [Rack #3, Node #1] initial_token:
>> >
>>  
>> 9452287970026068429538183539771339206,51987583835143376362460009468742365633,94522879700260684295381835397713392060,137058175565377992228303661326684418487
>> >     [Rack #3, Node #2] initial_token:
>> >
>>  
>> 23630719925065171073845458849428348015,66166015790182479006767284778399374442,108701311655299786939689110707370400869,151236607520417094872610936636341427296
>> >     [Rack #3, Node #3] initial_token:
>> >
>>  
>> 37809151880104273718152734159085356824,80344447745221581651074560088056383251,122879743610338889583996386017027409678,165415039475456197516918211945998436105
>> >
>> >     Could you please comment on whether this is a good approach for
>> >     allocating tokens when using vnodes.
>> >
>> >     Thank you.
>> >
>> >     Regards,
>> >     Kornel
>> >
>> >
>> >
>>  ---------------------------------------------------------------------
>> >     To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >     <mailto:user-unsubscr...@cassandra.apache.org>
>> >     For additional commands, e-mail: user-h...@cassandra.apache.org
>> >     <mailto:user-h...@cassandra.apache.org>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Re: Generating evenly distributed tokens for vnodes

Reply via email to