Hi Anthony,

Thank you very much for looking into using the script for initial token generation and for providing multiple detailed methods of expanding the cluster.

This helps a lot, indeed.

Regards,
Kornel

Anthony Grasso wrote:
Hi Kornel,

Great use of the script for generating initial tokens! I agree that you can achieve an optimal token distribution in a cluster using such a method.

One thing to think about is the process for expanding the size of the cluster in this case. For example consider the scenario where you wanted to insert a single new node into the cluster. To do this you would need to calculate what the new token ranges should be for the nodes including the new node. You would then need to reassign existing tokens to other nodes using 'nodetool move'. You would likely need to call this command a few times to do a few movements in order to achieve the newly calculated token assignments. Once the "gap" in the token ranges has been created, you would then update the initial_token property for the existing nodes in the cluster. Finally, you could then insert the new node with the assigned tokens.

While the above process could be used to maintain an optimal token distribution in a cluster, it does increase operational overhead. This is where allocate_tokens_for_keyspace and allocate_tokens_for_local_replication_factor (4.0 only) play a critical role. They save the operational overhead when changing the size of the cluster. In addition, from my experience they do a pretty good job at keeping the token ranges evenly distributed when expanding the cluster. Even in the case where a low number for num_tokens is used. If expanding the cluster size is required during an emergency, using the allocate_token_* setting would be the most simple and reliable way to quickly insert a node while maintaining reasonable token distribution.

The only other way to expand the cluster and maintain even token distribution without using an allocate_token_* setting, is to double the size of the cluster each time. Obviously this has its own draw backs in terms of increase costs to both money and time compared to inserting a single node.

Hope this helps.

Kind regards,
Anthony

On Thu, 28 May 2020 at 04:52, Kornel Pal <kornel...@gmail.com <mailto:kornel...@gmail.com>> wrote:

    As I understand, the previous discussion is about using
    allocate_tokens_for_keyspace for allocating tokens for most of the
    nodes. On the other hand, I am proposing to generate all the
    tokens for
    all the nodes using a Python script.

    This seems to result in perfectly even token ownership distribution
    across all the nodes for all possible replication factors, thus
    being an
    improvement over using allocate_tokens_for_keyspace.

    Elliott Sims wrote:
    > There's also a slightly older mailing list discussion on this
    subject
    > that goes into detail on this sort of strategy:
    > https://www.mail-archive.com/user@cassandra.apache.org/msg60006.html
    >
    > I've been approximately following it, repeating steps 3-6 for
    the first
    > host in each "rack(replica, since I have 3 racks and RF=3) then
    8-10 for
    > the remaining hosts in the new datacenter.  So far, so good
    (sample size
    > of 1) but it's a pretty painstaking process
    >
    > This should get a lot simpler with Cassandra 4+'s
    > "allocate_tokens_for_local_replication_factor" option, which will
    > default to 3.
    >
    > On Wed, May 27, 2020 at 4:34 AM Kornel Pal <kornel...@gmail.com
    <mailto:kornel...@gmail.com>
    > <mailto:kornel...@gmail.com <mailto:kornel...@gmail.com>>> wrote:
    >
    >     Hi,
    >
    >     Generating ideal tokens for single-token datacenters is well
    understood
    >     and documented, but there is much less information available on
    >     generating tokens with even ownership distribution when
    using vnodes.
    >     The best description I could find on token generation for
    vnodes is
    >
    
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
    >
    >     While allocate_tokens_for_keyspace results in much more even
    ownership
    >     distribution than random allocation, and does a great job at
    balancing
    >     ownership when adding new nodes, using it for creating a new
    datacenter
    >     results in less than ideal ownership distribution.
    >
    >     After some experimentation, I found that it is possible to
    generate all
    >     the tokens for a new datacenter with an extended version of
    the Python
    >     script presented in the above blog post. Using these tokens
    seem to
    >     result in perfectly even ownership distribution with various
    >     token/node/rack configurations for all possible replication
    factors.
    >
    >     Murmur3Partitioner:
    >       >>> datacenter_offset = 0
    >       >>> num_tokens = 4
    >       >>> num_racks = 3
    >       >>> num_nodes = 3
    >       >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
    >     {}'.format(r
    >     + 1, n + 1, ','.join([str(((2**64 / (num_tokens * num_nodes *
    >     num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) -
    >     2**63 +
    >     datacenter_offset) for t in range(num_tokens)])) for r in
    >     range(num_racks) for n in range(num_nodes)])
    >     [Rack #1, Node #1] initial_token:
    >  -9223372036854775808,-4611686018427387908,-8,4611686018427387892
    >     [Rack #1, Node #2] initial_token:
    >
     
-7686143364045646508,-3074457345618258608,1537228672809129292,6148914691236517192
    >     [Rack #1, Node #3] initial_token:
    >
     
-6148914691236517208,-1537228672809129308,3074457345618258592,7686143364045646492
    >     [Rack #2, Node #1] initial_token:
    >
     
-8710962479251732708,-4099276460824344808,512409557603043092,5124095576030430992
    >     [Rack #2, Node #2] initial_token:
    >
     
-7173733806442603408,-2562047788015215508,2049638230412172392,6661324248839560292
    >     [Rack #2, Node #3] initial_token:
    >
     
-5636505133633474108,-1024819115206086208,3586866903221301692,8198552921648689592
    >     [Rack #3, Node #1] initial_token:
    >
     
-8198552921648689608,-3586866903221301708,1024819115206086192,5636505133633474092
    >     [Rack #3, Node #2] initial_token:
    >
     
-6661324248839560308,-2049638230412172408,2562047788015215492,7173733806442603392
    >     [Rack #3, Node #3] initial_token:
    >
     
-5124095576030431008,-512409557603043108,4099276460824344792,8710962479251732692
    >
    >     RandomPartitioner:
    >       >>> datacenter_offset = 0
    >       >>> num_tokens = 4
    >       >>> num_racks = 3
    >       >>> num_nodes = 3
    >       >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
    >     {}'.format(r
    >     + 1, n + 1, ','.join([str(((2**127 / (num_tokens * num_nodes *
    >     num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) +
    >     datacenter_offset) for t in range(num_tokens)])) for r in
    >     range(num_racks) for n in range(num_nodes)])
    >     [Rack #1, Node #1] initial_token:
    >
     
0,42535295865117307932921825928971026427,85070591730234615865843651857942052854,127605887595351923798765477786913079281
    >     [Rack #1, Node #2] initial_token:
    >
     
14178431955039102644307275309657008809,56713727820156410577229101238628035236,99249023685273718510150927167599061663,141784319550391026443072753096570088090
    >     [Rack #1, Node #3] initial_token:
    >
     
28356863910078205288614550619314017618,70892159775195513221536376548285044045,113427455640312821154458202477256070472,155962751505430129087380028406227096899
    >     [Rack #2, Node #1] initial_token:
    >
     
4726143985013034214769091769885669603,47261439850130342147690917698856696030,89796735715247650080612743627827722457,132332031580364958013534569556798748884
    >     [Rack #2, Node #2] initial_token:
    >
     
18904575940052136859076367079542678412,61439871805169444791998193008513704839,103975167670286752724920018937484731266,146510463535404060657841844866455757693
    >     [Rack #2, Node #3] initial_token:
    >
     
33083007895091239503383642389199687221,75618303760208547436305468318170713648,118153599625325855369227294247141740075,160688895490443163302149120176112766502
    >     [Rack #3, Node #1] initial_token:
    >
     
9452287970026068429538183539771339206,51987583835143376362460009468742365633,94522879700260684295381835397713392060,137058175565377992228303661326684418487
    >     [Rack #3, Node #2] initial_token:
    >
     
23630719925065171073845458849428348015,66166015790182479006767284778399374442,108701311655299786939689110707370400869,151236607520417094872610936636341427296
    >     [Rack #3, Node #3] initial_token:
    >
     
37809151880104273718152734159085356824,80344447745221581651074560088056383251,122879743610338889583996386017027409678,165415039475456197516918211945998436105
    >
    >     Could you please comment on whether this is a good approach for
    >     allocating tokens when using vnodes.
    >
    >     Thank you.
    >
    >     Regards,
    >     Kornel
    >
    >
    >
     ---------------------------------------------------------------------
    >     To unsubscribe, e-mail:
    user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>
    >     <mailto:user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>>
    >     For additional commands, e-mail:
    user-h...@cassandra.apache.org <mailto:user-h...@cassandra.apache.org>
    >     <mailto:user-h...@cassandra.apache.org
    <mailto:user-h...@cassandra.apache.org>>
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
    <mailto:user-unsubscr...@cassandra.apache.org>
    For additional commands, e-mail: user-h...@cassandra.apache.org
    <mailto:user-h...@cassandra.apache.org>

Reply via email to