More tokens:  better data distribution, more expensive repairs, higher
probability of a multi-host outage taking some data offline and affecting
availability.

I think with >100 nodes the repair times and availability improvements make
a strong case for 16 tokens even though it means you'll need more total raw
space.

Switching from 256 to 16 vnodes definitely will make data distribution
worse.  I'm not sure "hot spot" is the right description so much as a wider
curve.  I've got one cluster that hasn't been migrated from 256 to 16, and
it has about a 6% delta between the smallest and largest nodes instead of
more like 20% on the 16-vnode clusters.  The newer
allocate_tokens_for_keyspace and (better)
allocate_tokens_for_replication_factor options help limit the data
distribution issues, but don't totally eliminate them.

On the other hand, the 16-vnode cluster takes less than half as long to
complete repairs via Reaper.  It also spends more time on GC, though I
can't tell whether that's due to vnodes or other differences.

On Sun, Mar 13, 2022 at 5:59 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello Team,
>
> I am currently using num_tokens: 256 (default in 3.11.X version) for my
> clusters and trying to understand the advantages vs disadvantages of
> changing it to 16 (I believe 16 is the new recommended value).  As per the 
> cassandra
> documentation
> <https://cassandra.apache.org/doc/latest/cassandra/getting_started/production.html#tokens>
>  16
> is not recommended for the cluster over 50 nodes.
>
> Best for heavily elastic clusters which expand and shrink regularly, but
>> may have issues availability with larger clusters. Not recommended for
>> clusters over 50 nodes.
>
>
> I have a few questions.
>
>
>    1. What are the general recommendations for a production cluster which
>    is > 100 nodes and are heavily elastic in terms of adding and removing
>    nodes.
>    2. If I am switching from 256 -> 16 tokens, does this cause any
>    hotspots by having the data concentrated to only a few nodes and not
>    distributing equally across all the nodes?
>
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.

Reply via email to