Re: Generating evenly distributed tokens for vnodes

Kornel Pal Fri, 29 May 2020 13:49:49 -0700

Hi Anthony,

Thank you very much for looking into using the script for initial tokengeneration and for providing multiple detailed methods of expanding thecluster.


This helps a lot, indeed.

Regards,
Kornel

Anthony Grasso wrote:

Hi Kornel,

Great use of the script for generating initial tokens! I agree thatyou can achieve an optimal token distribution in a cluster using sucha method.

One thing to think about is the process for expanding the size of thecluster in this case. For example consider the scenario where youwanted to insert a single new node into the cluster. To do this youwould need to calculate what the new token ranges should be for thenodes including the new node. You would then need to reassign existingtokens to other nodes using 'nodetool move'. You would likely need tocall this command a few times to do a few movements in order toachieve the newly calculated token assignments. Once the "gap" in thetoken ranges has been created, you would then update the initial_tokenproperty for the existing nodes in the cluster. Finally, you couldthen insert the new node with the assigned tokens.

While the above process could be used to maintain an optimal tokendistribution in a cluster, it does increase operational overhead. Thisis where allocate_tokens_for_keyspace andallocate_tokens_for_local_replication_factor (4.0 only) play acritical role. They save the operational overhead when changing thesize of the cluster. In addition, from my experience they do a prettygood job at keeping the token ranges evenly distributed when expandingthe cluster. Even in the case where a low number for num_tokens isused. If expanding the cluster size is required during an emergency,using the allocate_token_* setting would be the most simple andreliable way to quickly insert a node while maintaining reasonabletoken distribution.

The only other way to expand the cluster and maintain even tokendistribution without using an allocate_token_* setting, is to doublethe size of the cluster each time. Obviously this has its own drawbacks in terms of increase costs to both money and time compared toinserting a single node.


Hope this helps.

Kind regards,
Anthony

On Thu, 28 May 2020 at 04:52, Kornel Pal <kornel...@gmail.com<mailto:kornel...@gmail.com>> wrote:


    As I understand, the previous discussion is about using
    allocate_tokens_for_keyspace for allocating tokens for most of the
    nodes. On the other hand, I am proposing to generate all the
    tokens for
    all the nodes using a Python script.

    This seems to result in perfectly even token ownership distribution
    across all the nodes for all possible replication factors, thus
    being an
    improvement over using allocate_tokens_for_keyspace.

    Elliott Sims wrote:
    > There's also a slightly older mailing list discussion on this
    subject
    > that goes into detail on this sort of strategy:
    > https://www.mail-archive.com/user@cassandra.apache.org/msg60006.html
    >
    > I've been approximately following it, repeating steps 3-6 for
    the first
    > host in each "rack(replica, since I have 3 racks and RF=3) then
    8-10 for
    > the remaining hosts in the new datacenter.  So far, so good
    (sample size
    > of 1) but it's a pretty painstaking process
    >
    > This should get a lot simpler with Cassandra 4+'s
    > "allocate_tokens_for_local_replication_factor" option, which will
    > default to 3.
    >
    > On Wed, May 27, 2020 at 4:34 AM Kornel Pal <kornel...@gmail.com
    <mailto:kornel...@gmail.com>
    > <mailto:kornel...@gmail.com <mailto:kornel...@gmail.com>>> wrote:
    >
    >     Hi,
    >
    >     Generating ideal tokens for single-token datacenters is well
    understood
    >     and documented, but there is much less information available on
    >     generating tokens with even ownership distribution when
    using vnodes.
    >     The best description I could find on token generation for
    vnodes is
    >
    
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
    >
    >     While allocate_tokens_for_keyspace results in much more even
    ownership
    >     distribution than random allocation, and does a great job at
    balancing
    >     ownership when adding new nodes, using it for creating a new
    datacenter
    >     results in less than ideal ownership distribution.
    >
    >     After some experimentation, I found that it is possible to
    generate all
    >     the tokens for a new datacenter with an extended version of
    the Python
    >     script presented in the above blog post. Using these tokens
    seem to
    >     result in perfectly even ownership distribution with various
    >     token/node/rack configurations for all possible replication
    factors.
    >
    >     Murmur3Partitioner:
    >       >>> datacenter_offset = 0
    >       >>> num_tokens = 4
    >       >>> num_racks = 3
    >       >>> num_nodes = 3
    >       >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
    >     {}'.format(r
    >     + 1, n + 1, ','.join([str(((2**64 / (num_tokens * num_nodes *
    >     num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) -
    >     2**63 +
    >     datacenter_offset) for t in range(num_tokens)])) for r in
    >     range(num_racks) for n in range(num_nodes)])
    >     [Rack #1, Node #1] initial_token:
    >  -9223372036854775808,-4611686018427387908,-8,4611686018427387892
    >     [Rack #1, Node #2] initial_token:
    >
     
-7686143364045646508,-3074457345618258608,1537228672809129292,6148914691236517192
    >     [Rack #1, Node #3] initial_token:
    >
     
-6148914691236517208,-1537228672809129308,3074457345618258592,7686143364045646492
    >     [Rack #2, Node #1] initial_token:
    >
     
-8710962479251732708,-4099276460824344808,512409557603043092,5124095576030430992
    >     [Rack #2, Node #2] initial_token:
    >
     
-7173733806442603408,-2562047788015215508,2049638230412172392,6661324248839560292
    >     [Rack #2, Node #3] initial_token:
    >
     
-5636505133633474108,-1024819115206086208,3586866903221301692,8198552921648689592
    >     [Rack #3, Node #1] initial_token:
    >
     
-8198552921648689608,-3586866903221301708,1024819115206086192,5636505133633474092
    >     [Rack #3, Node #2] initial_token:
    >
     
-6661324248839560308,-2049638230412172408,2562047788015215492,7173733806442603392
    >     [Rack #3, Node #3] initial_token:
    >
     
-5124095576030431008,-512409557603043108,4099276460824344792,8710962479251732692
    >
    >     RandomPartitioner:
    >       >>> datacenter_offset = 0
    >       >>> num_tokens = 4
    >       >>> num_racks = 3
    >       >>> num_nodes = 3
    >       >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
    >     {}'.format(r
    >     + 1, n + 1, ','.join([str(((2**127 / (num_tokens * num_nodes *
    >     num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) +
    >     datacenter_offset) for t in range(num_tokens)])) for r in
    >     range(num_racks) for n in range(num_nodes)])
    >     [Rack #1, Node #1] initial_token:
    >
     
0,42535295865117307932921825928971026427,85070591730234615865843651857942052854,127605887595351923798765477786913079281
    >     [Rack #1, Node #2] initial_token:
    >
     
14178431955039102644307275309657008809,56713727820156410577229101238628035236,99249023685273718510150927167599061663,141784319550391026443072753096570088090
    >     [Rack #1, Node #3] initial_token:
    >
     
28356863910078205288614550619314017618,70892159775195513221536376548285044045,113427455640312821154458202477256070472,155962751505430129087380028406227096899
    >     [Rack #2, Node #1] initial_token:
    >
     
4726143985013034214769091769885669603,47261439850130342147690917698856696030,89796735715247650080612743627827722457,132332031580364958013534569556798748884
    >     [Rack #2, Node #2] initial_token:
    >
     
18904575940052136859076367079542678412,61439871805169444791998193008513704839,103975167670286752724920018937484731266,146510463535404060657841844866455757693
    >     [Rack #2, Node #3] initial_token:
    >
     
33083007895091239503383642389199687221,75618303760208547436305468318170713648,118153599625325855369227294247141740075,160688895490443163302149120176112766502
    >     [Rack #3, Node #1] initial_token:
    >
     
9452287970026068429538183539771339206,51987583835143376362460009468742365633,94522879700260684295381835397713392060,137058175565377992228303661326684418487
    >     [Rack #3, Node #2] initial_token:
    >
     
23630719925065171073845458849428348015,66166015790182479006767284778399374442,108701311655299786939689110707370400869,151236607520417094872610936636341427296
    >     [Rack #3, Node #3] initial_token:
    >
     
37809151880104273718152734159085356824,80344447745221581651074560088056383251,122879743610338889583996386017027409678,165415039475456197516918211945998436105
    >
    >     Could you please comment on whether this is a good approach for
    >     allocating tokens when using vnodes.
    >
    >     Thank you.
    >
    >     Regards,
    >     Kornel
    >
    >
    >
     ---------------------------------------------------------------------
    >     To unsubscribe, e-mail:
    user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>
    >     <mailto:user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>>
    >     For additional commands, e-mail:
    user-h...@cassandra.apache.org <mailto:user-h...@cassandra.apache.org>
    >     <mailto:user-h...@cassandra.apache.org
    <mailto:user-h...@cassandra.apache.org>>
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>
    For additional commands, e-mail: user-h...@cassandra.apache.org
    <mailto:user-h...@cassandra.apache.org>

Re: Generating evenly distributed tokens for vnodes

Reply via email to