Re: Change num_tokens in a live cluster

2024-05-16 Thread Gábor Auth
Hi,

On Thu, 16 May 2024, 17:40 Bowen Song via user, 
wrote:

> Replacing nodes one by one in the existing DC is not the same as replacing
> an entire DC.
>
> For example, if you change from 256 vnodes to 4 vnodes on a 100 nodes
> single DC cluster. Before you start, each node owns ~1% of the cluster's
> data. But after changing 99 nodes, the last remaining node will own ~39% of
> the cluster's data. Will that node have enough storage and computing
> capacity to handle that? Unless you have significantly over-provisioned
> node size, the answer is definitely no. The way to work around this is to
> gradually reduce the vnodes number. E.g. reducing from 256 to 128 will
> require the last node to have 2x the capacity, which is much more doable
> than 39x. To do it this way, you will need to repeat the process to reduce
> vnodes number from 256 to 128, then to 64, 32, 16, 8 and finally 4.
>
> So, the most significant difference is, how many times do the data need to
> be moved?
>
Thank you for the explanation, this will help others think about it when
they search about changing num_tokens... :)

I am aware about it, but in my current case there are only 4 nodes, with a
total of maybe ~25GB of data. So, creation of a new DC is more hassle for
me than replace nodes one-by-one.

My question was whether there is a simpler solution. And it looks like
there is no... :(

Bye,
Gábor AUTH


Re: Change num_tokens in a live cluster

2024-05-16 Thread Bowen Song via user
Replacing nodes one by one in the existing DC is not the same as 
replacing an entire DC.


For example, if you change from 256 vnodes to 4 vnodes on a 100 nodes 
single DC cluster. Before you start, each node owns ~1% of the cluster's 
data. But after changing 99 nodes, the last remaining node will own ~39% 
of the cluster's data. Will that node have enough storage and computing 
capacity to handle that? Unless you have significantly over-provisioned 
node size, the answer is definitely no. The way to work around this is 
to gradually reduce the vnodes number. E.g. reducing from 256 to 128 
will require the last node to have 2x the capacity, which is much more 
doable than 39x. To do it this way, you will need to repeat the process 
to reduce vnodes number from 256 to 128, then to 64, 32, 16, 8 and 
finally 4.


So, the most significant difference is, how many times do the data need 
to be moved?



On 16/05/2024 15:54, Gábor Auth wrote:

Hi,

On Thu, 16 May 2024, 10:37 Bowen Song via user, 
 wrote:


You can also add a new DC with the desired number of nodes and
num_tokens on each node with auto bootstrap disabled, then rebuild
the new DC from the existing DC before decommission the existing
DC. This method only needs to copy data once, and can copy from/to
multiple nodes concurrently, therefore is significantly faster, at
the cost of doubling the number of nodes temporarily.

For me it's easier the replacement of nodes one-by-one in the same DC, 
so that, no any new technique... :)


Thanks,
Gábor AUTH

Re: Change num_tokens in a live cluster

2024-05-16 Thread Gábor Auth
Hi,

On Thu, 16 May 2024, 16:55 Jon Haddad,  wrote:

> Unless your cluster is very small, using the method of adding / removing
> nodes will eventually result in putting a much larger portion of your
> dataset on a very few number of nodes.  I *highly* discourage this.
>

It has ~15 GB data on one-one node and it has only 4 nodes, so, I name it
very small. :)

Bye,
Gábor AUTH


Re: Change num_tokens in a live cluster

2024-05-16 Thread Gábor Auth
Hi,

On Thu, 16 May 2024, 10:37 Bowen Song via user, 
wrote:

> You can also add a new DC with the desired number of nodes and num_tokens
> on each node with auto bootstrap disabled, then rebuild the new DC from the
> existing DC before decommission the existing DC. This method only needs to
> copy data once, and can copy from/to multiple nodes concurrently, therefore
> is significantly faster, at the cost of doubling the number of nodes
> temporarily.
>
For me it's easier the replacement of nodes one-by-one in the same DC, so
that, no any new technique... :)

Thanks,
Gábor AUTH


Re: Change num_tokens in a live cluster

2024-05-16 Thread Jon Haddad
Unless your cluster is very small, using the method of adding / removing
nodes will eventually result in putting a much larger portion of your
dataset on a very few number of nodes.  I *highly* discourage this.

The only correct, safe path is Bowen's suggestion of adding another DC and
decommissioning the old one.

Jon

On Thu, May 16, 2024 at 1:37 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> You can also add a new DC with the desired number of nodes and num_tokens
> on each node with auto bootstrap disabled, then rebuild the new DC from the
> existing DC before decommission the existing DC. This method only needs to
> copy data once, and can copy from/to multiple nodes concurrently, therefore
> is significantly faster, at the cost of doubling the number of nodes
> temporarily.
> On 16/05/2024 09:21, Gábor Auth wrote:
>
> Hi.
>
> Is there a newer/easier workflow to change num_tokens in an existing
> cluster than add a new node to the cluster with the other num_tokens value
> and decommission an old one, repeat and rinse through all nodes?
>
> --
> Bye,
> Gábor AUTH
>
>


Re: Change num_tokens in a live cluster

2024-05-16 Thread Bowen Song via user
You can also add a new DC with the desired number of nodes and 
num_tokens on each node with auto bootstrap disabled, then rebuild the 
new DC from the existing DC before decommission the existing DC. This 
method only needs to copy data once, and can copy from/to multiple nodes 
concurrently, therefore is significantly faster, at the cost of doubling 
the number of nodes temporarily.


On 16/05/2024 09:21, Gábor Auth wrote:

Hi.

Is there a newer/easier workflow to change num_tokens in an existing 
cluster than add a new node to the cluster with the other num_tokens 
value and decommission an old one, repeat and rinse through all nodes?


--
Bye,
Gábor AUTH