Hi Kornel,
Great use of the script for generating initial tokens! I agree that
you can achieve an optimal token distribution in a cluster using such
a method.
One thing to think about is the process for expanding the size of the
cluster in this case. For example consider the scenario where you
wanted to insert a single new node into the cluster. To do this you
would need to calculate what the new token ranges should be for the
nodes including the new node. You would then need to reassign existing
tokens to other nodes using 'nodetool move'. You would likely need to
call this command a few times to do a few movements in order to
achieve the newly calculated token assignments. Once the "gap" in the
token ranges has been created, you would then update the initial_token
property for the existing nodes in the cluster. Finally, you could
then insert the new node with the assigned tokens.
While the above process could be used to maintain an optimal token
distribution in a cluster, it does increase operational overhead. This
is where allocate_tokens_for_keyspace and
allocate_tokens_for_local_replication_factor (4.0 only) play a
critical role. They save the operational overhead when changing the
size of the cluster. In addition, from my experience they do a pretty
good job at keeping the token ranges evenly distributed when expanding
the cluster. Even in the case where a low number for num_tokens is
used. If expanding the cluster size is required during an emergency,
using the allocate_token_* setting would be the most simple and
reliable way to quickly insert a node while maintaining reasonable
token distribution.
The only other way to expand the cluster and maintain even token
distribution without using an allocate_token_* setting, is to double
the size of the cluster each time. Obviously this has its own draw
backs in terms of increase costs to both money and time compared to
inserting a single node.
Hope this helps.
Kind regards,
Anthony
On Thu, 28 May 2020 at 04:52, Kornel Pal <kornel...@gmail.com
<mailto:kornel...@gmail.com>> wrote:
As I understand, the previous discussion is about using
allocate_tokens_for_keyspace for allocating tokens for most of the
nodes. On the other hand, I am proposing to generate all the
tokens for
all the nodes using a Python script.
This seems to result in perfectly even token ownership distribution
across all the nodes for all possible replication factors, thus
being an
improvement over using allocate_tokens_for_keyspace.
Elliott Sims wrote:
> There's also a slightly older mailing list discussion on this
subject
> that goes into detail on this sort of strategy:
> https://www.mail-archive.com/user@cassandra.apache.org/msg60006.html
>
> I've been approximately following it, repeating steps 3-6 for
the first
> host in each "rack(replica, since I have 3 racks and RF=3) then
8-10 for
> the remaining hosts in the new datacenter. So far, so good
(sample size
> of 1) but it's a pretty painstaking process
>
> This should get a lot simpler with Cassandra 4+'s
> "allocate_tokens_for_local_replication_factor" option, which will
> default to 3.
>
> On Wed, May 27, 2020 at 4:34 AM Kornel Pal <kornel...@gmail.com
<mailto:kornel...@gmail.com>
> <mailto:kornel...@gmail.com <mailto:kornel...@gmail.com>>> wrote:
>
> Hi,
>
> Generating ideal tokens for single-token datacenters is well
understood
> and documented, but there is much less information available on
> generating tokens with even ownership distribution when
using vnodes.
> The best description I could find on token generation for
vnodes is
>
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
> While allocate_tokens_for_keyspace results in much more even
ownership
> distribution than random allocation, and does a great job at
balancing
> ownership when adding new nodes, using it for creating a new
datacenter
> results in less than ideal ownership distribution.
>
> After some experimentation, I found that it is possible to
generate all
> the tokens for a new datacenter with an extended version of
the Python
> script presented in the above blog post. Using these tokens
seem to
> result in perfectly even ownership distribution with various
> token/node/rack configurations for all possible replication
factors.
>
> Murmur3Partitioner:
> >>> datacenter_offset = 0
> >>> num_tokens = 4
> >>> num_racks = 3
> >>> num_nodes = 3
> >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
> {}'.format(r
> + 1, n + 1, ','.join([str(((2**64 / (num_tokens * num_nodes *
> num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) -
> 2**63 +
> datacenter_offset) for t in range(num_tokens)])) for r in
> range(num_racks) for n in range(num_nodes)])
> [Rack #1, Node #1] initial_token:
> -9223372036854775808,-4611686018427387908,-8,4611686018427387892
> [Rack #1, Node #2] initial_token:
>
-7686143364045646508,-3074457345618258608,1537228672809129292,6148914691236517192
> [Rack #1, Node #3] initial_token:
>
-6148914691236517208,-1537228672809129308,3074457345618258592,7686143364045646492
> [Rack #2, Node #1] initial_token:
>
-8710962479251732708,-4099276460824344808,512409557603043092,5124095576030430992
> [Rack #2, Node #2] initial_token:
>
-7173733806442603408,-2562047788015215508,2049638230412172392,6661324248839560292
> [Rack #2, Node #3] initial_token:
>
-5636505133633474108,-1024819115206086208,3586866903221301692,8198552921648689592
> [Rack #3, Node #1] initial_token:
>
-8198552921648689608,-3586866903221301708,1024819115206086192,5636505133633474092
> [Rack #3, Node #2] initial_token:
>
-6661324248839560308,-2049638230412172408,2562047788015215492,7173733806442603392
> [Rack #3, Node #3] initial_token:
>
-5124095576030431008,-512409557603043108,4099276460824344792,8710962479251732692
>
> RandomPartitioner:
> >>> datacenter_offset = 0
> >>> num_tokens = 4
> >>> num_racks = 3
> >>> num_nodes = 3
> >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
> {}'.format(r
> + 1, n + 1, ','.join([str(((2**127 / (num_tokens * num_nodes *
> num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) +
> datacenter_offset) for t in range(num_tokens)])) for r in
> range(num_racks) for n in range(num_nodes)])
> [Rack #1, Node #1] initial_token:
>
0,42535295865117307932921825928971026427,85070591730234615865843651857942052854,127605887595351923798765477786913079281
> [Rack #1, Node #2] initial_token:
>
14178431955039102644307275309657008809,56713727820156410577229101238628035236,99249023685273718510150927167599061663,141784319550391026443072753096570088090
> [Rack #1, Node #3] initial_token:
>
28356863910078205288614550619314017618,70892159775195513221536376548285044045,113427455640312821154458202477256070472,155962751505430129087380028406227096899
> [Rack #2, Node #1] initial_token:
>
4726143985013034214769091769885669603,47261439850130342147690917698856696030,89796735715247650080612743627827722457,132332031580364958013534569556798748884
> [Rack #2, Node #2] initial_token:
>
18904575940052136859076367079542678412,61439871805169444791998193008513704839,103975167670286752724920018937484731266,146510463535404060657841844866455757693
> [Rack #2, Node #3] initial_token:
>
33083007895091239503383642389199687221,75618303760208547436305468318170713648,118153599625325855369227294247141740075,160688895490443163302149120176112766502
> [Rack #3, Node #1] initial_token:
>
9452287970026068429538183539771339206,51987583835143376362460009468742365633,94522879700260684295381835397713392060,137058175565377992228303661326684418487
> [Rack #3, Node #2] initial_token:
>
23630719925065171073845458849428348015,66166015790182479006767284778399374442,108701311655299786939689110707370400869,151236607520417094872610936636341427296
> [Rack #3, Node #3] initial_token:
>
37809151880104273718152734159085356824,80344447745221581651074560088056383251,122879743610338889583996386017027409678,165415039475456197516918211945998436105
>
> Could you please comment on whether this is a good approach for
> allocating tokens when using vnodes.
>
> Thank you.
>
> Regards,
> Kornel
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
user-unsubscr...@cassandra.apache.org
<mailto:user-unsubscr...@cassandra.apache.org>
> <mailto:user-unsubscr...@cassandra.apache.org
<mailto:user-unsubscr...@cassandra.apache.org>>
> For additional commands, e-mail:
user-h...@cassandra.apache.org <mailto:user-h...@cassandra.apache.org>
> <mailto:user-h...@cassandra.apache.org
<mailto:user-h...@cassandra.apache.org>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: user-h...@cassandra.apache.org
<mailto:user-h...@cassandra.apache.org>