Re: Is cleanup is required if cluster topology changes

2023-05-09 Thread Jaydeep Chovatia
Another request to the community to see if this is feasible or not:
Can we not wait for (CEP-21), and do the necessary cleanup as part of
regular compaction itself to avoid running *cleanup* manually? For now, we
can control through a flag, which is *false* by default. Whosoever wants to
do the cleanup as part of compaction will turn this flag on. Once we have
CEP-21 addressed, then we can remove this flag, and enable this always.
Thoughts?

Jaydeep

On Tue, May 9, 2023 at 3:58 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Because an operator will need to check and ensure the schema is consistent
> across the cluster before running "nodetool cleanup". At the moment, it's
> the operator's responsibility to ensure bad things don't happen.
> On 09/05/2023 06:20, Jaydeep Chovatia wrote:
>
> One clarification question Jeff.
> AFAIK, the *nodetool cleanup* also internally goes through the same
> compaction path as the regular compaction. Then why do we have to wait for
> CEP-21 to clean up unowned data in the regular compaction path? Wouldn't it
> be as simple as regular compaction just invoke the code of *nodetool
> cleanup*?
> In other words, without CEP-21, why is *nodetool cleanup* a safer
> operation but doing the same in the regular compaction isn't?
>
> Jaydeep
>
> On Fri, May 5, 2023 at 11:58 AM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Thanks, Jeff, for the detailed steps and summary.
>> We will keep the community (this thread) up to date on how it plays out
>> in our fleet.
>>
>> Jaydeep
>>
>> On Fri, May 5, 2023 at 9:10 AM Jeff Jirsa  wrote:
>>
>>> Lots of caveats on these suggestions, let me try to hit most of them.
>>>
>>> Cleanup in parallel is good and fine and common. Limit number of threads
>>> in cleanup if you're using lots of vnodes, so each node runs one at a time
>>> and not all nodes use all your cores at the same time.
>>> If a host is fully offline, you can ALSO use replace address first boot.
>>> It'll stream data right to that host with the same token assignments you
>>> had before, and no cleanup is needed then. Strictly speaking, to avoid
>>> resurrection here, you'd want to run repair on the replicas of the down
>>> host (for vnodes, probably the whole cluster), but your current process
>>> doesnt guarantee that either (decom + bootstrap may resurrect, strictly
>>> speaking).
>>> Dropping vnodes will reduce the replicas that have to be cleaned up, but
>>> also potentially increase your imbalance on each replacement.
>>>
>>> Cassandra should still do this on its own, and I think once CEP-21 is
>>> committed, this should be one of the first enhancement tickets.
>>>
>>> Until then, LeveledCompactionStrategy really does make cleanup fast and
>>> cheap, at the cost of higher IO the rest of the time. If you can tolerate
>>> that higher IO, you'll probably appreciate LCS anyway (faster reads, faster
>>> data deletion than STCS). It's a lot of IO compared to STCS though.
>>>
>>>
>>>
>>> On Fri, May 5, 2023 at 9:02 AM Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
 Thanks all for your valuable inputs. We will try some of the suggested
 methods in this thread, and see how it goes. We will keep you updated on
 our progress.
 Thanks a lot once again!

 Jaydeep

 On Fri, May 5, 2023 at 8:55 AM Bowen Song via user <
 user@cassandra.apache.org> wrote:

> Depending on the number of vnodes per server, the probability and
> severity (i.e. the size of the affected token ranges) of an availability
> degradation due to a server failure during node replacement may be small.
> You also have the choice of increasing the RF if that's still not
> acceptable.
>
> Also, reducing number of vnodes per server can limit the number of
> servers affected by replacing a single server, therefore reducing the
> amount of time required to run "nodetool cleanup" if it is run 
> sequentially.
>
> Finally, you may choose to run "nodetool cleanup" concurrently on
> multiple nodes to reduce the amount of time required to complete it.
>
>
> On 05/05/2023 16:26, Runtian Liu wrote:
>
> We are doing the "adding a node then decommissioning a node" to
> achieve better availability. Replacing a node need to shut down one node
> first, if another node is down during the node replacement period, we will
> get availability drop because most of our use case is local_quorum with
> replication factor 3.
>
> On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> Have you thought of using
>> "-Dcassandra.replace_address_first_boot=..." (or
>> "-Dcassandra.replace_address=..." if you are using an older version)? 
>> This
>> will not result in a topology change, which means "nodetool cleanup" is 
>> not
>> needed after the operation is completed.
>> On 05/05/2023 05:24, Jaydeep 

Re: Is cleanup is required if cluster topology changes

2023-05-09 Thread Bowen Song via user
Because an operator will need to check and ensure the schema is 
consistent across the cluster before running "nodetool cleanup". At the 
moment, it's the operator's responsibility to ensure bad things don't 
happen.


On 09/05/2023 06:20, Jaydeep Chovatia wrote:

One clarification question Jeff.
AFAIK, the /nodetool cleanup/ also internally goes through the same 
compaction path as the regular compaction. Then why do we have to wait 
for CEP-21 to clean up unowned data in the regular compaction path? 
Wouldn't it be as simple as regular compaction just invoke the code of 
/nodetool cleanup/?
In other words, without CEP-21, why is /nodetool cleanup/ a safer 
operation but doing the same in the regular compaction isn't?


Jaydeep

On Fri, May 5, 2023 at 11:58 AM Jaydeep Chovatia 
 wrote:


Thanks, Jeff, for the detailed steps and summary.
We will keep the community (this thread) up to date on how it
plays out in our fleet.

Jaydeep

On Fri, May 5, 2023 at 9:10 AM Jeff Jirsa  wrote:

Lots of caveats on these suggestions, let me try to hit most
of them.

Cleanup in parallel is good and fine and common. Limit number
of threads in cleanup if you're using lots of vnodes, so each
node runs one at a time and not all nodes use all your cores
at the same time.
If a host is fully offline, you can ALSO use replace address
first boot. It'll stream data right to that host with the same
token assignments you had before, and no cleanup is needed
then. Strictly speaking, to avoid resurrection here, you'd
want to run repair on the replicas of the down host (for
vnodes, probably the whole cluster), but your current process
doesnt guarantee that either (decom + bootstrap may resurrect,
strictly speaking).
Dropping vnodes will reduce the replicas that have to be
cleaned up, but also potentially increase your imbalance on
each replacement.

Cassandra should still do this on its own, and I think once
CEP-21 is committed, this should be one of the first
enhancement tickets.

Until then, LeveledCompactionStrategy really does make cleanup
fast and cheap, at the cost of higher IO the rest of the time.
If you can tolerate that higher IO, you'll probably appreciate
LCS anyway (faster reads, faster data deletion than STCS).
It's a lot of IO compared to STCS though.


On Fri, May 5, 2023 at 9:02 AM Jaydeep Chovatia
 wrote:

Thanks all for your valuable inputs. We will try some of
the suggested methods in this thread, and see how it goes.
We will keep you updated on our progress.
Thanks a lot once again!

Jaydeep

On Fri, May 5, 2023 at 8:55 AM Bowen Song via user
 wrote:

Depending on the number of vnodes per server, the
probability and severity (i.e. the size of the
affected token ranges) of an availability degradation
due to a server failure during node replacement may be
small. You also have the choice of increasing the RF
if that's still not acceptable.

Also, reducing number of vnodes per server can limit
the number of servers affected by replacing a single
server, therefore reducing the amount of time required
to run "nodetool cleanup" if it is run sequentially.

Finally, you may choose to run "nodetool cleanup"
concurrently on multiple nodes to reduce the amount of
time required to complete it.


On 05/05/2023 16:26, Runtian Liu wrote:

We are doing the "adding a node then decommissioning
a node" to achieve better availability. Replacing a
node need to shut down one node first, if another
node is down during the node replacement period, we
will get availability drop because most of our use
case is local_quorum with replication factor 3.

On Fri, May 5, 2023 at 5:59 AM Bowen Song via user
 wrote:

Have you thought of using
"-Dcassandra.replace_address_first_boot=..." (or
"-Dcassandra.replace_address=..." if you are
using an older version)? This will not result in
a topology change, which means "nodetool cleanup"
is not needed after the operation is completed.

On 05/05/2023 05:24, Jaydeep Chovatia wrote:

Thanks, Jeff!
But in our environment we replace nodes quite
often for various optimization purposes, etc.
say, almost 1 node per day (node /addition/
   

Re: Is cleanup is required if cluster topology changes

2023-05-08 Thread Jaydeep Chovatia
One clarification question Jeff.
AFAIK, the *nodetool cleanup* also internally goes through the same
compaction path as the regular compaction. Then why do we have to wait for
CEP-21 to clean up unowned data in the regular compaction path? Wouldn't it
be as simple as regular compaction just invoke the code of *nodetool
cleanup*?
In other words, without CEP-21, why is *nodetool cleanup* a safer operation
but doing the same in the regular compaction isn't?

Jaydeep

On Fri, May 5, 2023 at 11:58 AM Jaydeep Chovatia 
wrote:

> Thanks, Jeff, for the detailed steps and summary.
> We will keep the community (this thread) up to date on how it plays out in
> our fleet.
>
> Jaydeep
>
> On Fri, May 5, 2023 at 9:10 AM Jeff Jirsa  wrote:
>
>> Lots of caveats on these suggestions, let me try to hit most of them.
>>
>> Cleanup in parallel is good and fine and common. Limit number of threads
>> in cleanup if you're using lots of vnodes, so each node runs one at a time
>> and not all nodes use all your cores at the same time.
>> If a host is fully offline, you can ALSO use replace address first boot.
>> It'll stream data right to that host with the same token assignments you
>> had before, and no cleanup is needed then. Strictly speaking, to avoid
>> resurrection here, you'd want to run repair on the replicas of the down
>> host (for vnodes, probably the whole cluster), but your current process
>> doesnt guarantee that either (decom + bootstrap may resurrect, strictly
>> speaking).
>> Dropping vnodes will reduce the replicas that have to be cleaned up, but
>> also potentially increase your imbalance on each replacement.
>>
>> Cassandra should still do this on its own, and I think once CEP-21 is
>> committed, this should be one of the first enhancement tickets.
>>
>> Until then, LeveledCompactionStrategy really does make cleanup fast and
>> cheap, at the cost of higher IO the rest of the time. If you can tolerate
>> that higher IO, you'll probably appreciate LCS anyway (faster reads, faster
>> data deletion than STCS). It's a lot of IO compared to STCS though.
>>
>>
>>
>> On Fri, May 5, 2023 at 9:02 AM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Thanks all for your valuable inputs. We will try some of the suggested
>>> methods in this thread, and see how it goes. We will keep you updated on
>>> our progress.
>>> Thanks a lot once again!
>>>
>>> Jaydeep
>>>
>>> On Fri, May 5, 2023 at 8:55 AM Bowen Song via user <
>>> user@cassandra.apache.org> wrote:
>>>
 Depending on the number of vnodes per server, the probability and
 severity (i.e. the size of the affected token ranges) of an availability
 degradation due to a server failure during node replacement may be small.
 You also have the choice of increasing the RF if that's still not
 acceptable.

 Also, reducing number of vnodes per server can limit the number of
 servers affected by replacing a single server, therefore reducing the
 amount of time required to run "nodetool cleanup" if it is run 
 sequentially.

 Finally, you may choose to run "nodetool cleanup" concurrently on
 multiple nodes to reduce the amount of time required to complete it.


 On 05/05/2023 16:26, Runtian Liu wrote:

 We are doing the "adding a node then decommissioning a node" to
 achieve better availability. Replacing a node need to shut down one node
 first, if another node is down during the node replacement period, we will
 get availability drop because most of our use case is local_quorum with
 replication factor 3.

 On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
 user@cassandra.apache.org> wrote:

> Have you thought of using "-Dcassandra.replace_address_first_boot=..."
> (or "-Dcassandra.replace_address=..." if you are using an older version)?
> This will not result in a topology change, which means "nodetool cleanup"
> is not needed after the operation is completed.
> On 05/05/2023 05:24, Jaydeep Chovatia wrote:
>
> Thanks, Jeff!
> But in our environment we replace nodes quite often for various
> optimization purposes, etc. say, almost 1 node per day (node
> *addition* followed by node *decommission*, which of course changes
> the topology), and we have a cluster of size 100 nodes with 300GB per 
> node.
> If we have to run cleanup on 100 nodes after every replacement, then it
> could take forever.
> What is the recommendation until we get this fixed in Cassandra itself
> as part of compaction (w/o externally triggering *cleanup*)?
>
> Jaydeep
>
> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>
>> Cleanup is fast and cheap and basically a no-op if you haven’t
>> changed the ring
>>
>> After cassandra has transactional cluster metadata to make ring
>> changes strongly consistent, cassandra should do this in every 
>> compaction.
>> But until then it’s 

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jaydeep Chovatia
Thanks, Jeff, for the detailed steps and summary.
We will keep the community (this thread) up to date on how it plays out in
our fleet.

Jaydeep

On Fri, May 5, 2023 at 9:10 AM Jeff Jirsa  wrote:

> Lots of caveats on these suggestions, let me try to hit most of them.
>
> Cleanup in parallel is good and fine and common. Limit number of threads
> in cleanup if you're using lots of vnodes, so each node runs one at a time
> and not all nodes use all your cores at the same time.
> If a host is fully offline, you can ALSO use replace address first boot.
> It'll stream data right to that host with the same token assignments you
> had before, and no cleanup is needed then. Strictly speaking, to avoid
> resurrection here, you'd want to run repair on the replicas of the down
> host (for vnodes, probably the whole cluster), but your current process
> doesnt guarantee that either (decom + bootstrap may resurrect, strictly
> speaking).
> Dropping vnodes will reduce the replicas that have to be cleaned up, but
> also potentially increase your imbalance on each replacement.
>
> Cassandra should still do this on its own, and I think once CEP-21 is
> committed, this should be one of the first enhancement tickets.
>
> Until then, LeveledCompactionStrategy really does make cleanup fast and
> cheap, at the cost of higher IO the rest of the time. If you can tolerate
> that higher IO, you'll probably appreciate LCS anyway (faster reads, faster
> data deletion than STCS). It's a lot of IO compared to STCS though.
>
>
>
> On Fri, May 5, 2023 at 9:02 AM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Thanks all for your valuable inputs. We will try some of the suggested
>> methods in this thread, and see how it goes. We will keep you updated on
>> our progress.
>> Thanks a lot once again!
>>
>> Jaydeep
>>
>> On Fri, May 5, 2023 at 8:55 AM Bowen Song via user <
>> user@cassandra.apache.org> wrote:
>>
>>> Depending on the number of vnodes per server, the probability and
>>> severity (i.e. the size of the affected token ranges) of an availability
>>> degradation due to a server failure during node replacement may be small.
>>> You also have the choice of increasing the RF if that's still not
>>> acceptable.
>>>
>>> Also, reducing number of vnodes per server can limit the number of
>>> servers affected by replacing a single server, therefore reducing the
>>> amount of time required to run "nodetool cleanup" if it is run sequentially.
>>>
>>> Finally, you may choose to run "nodetool cleanup" concurrently on
>>> multiple nodes to reduce the amount of time required to complete it.
>>>
>>>
>>> On 05/05/2023 16:26, Runtian Liu wrote:
>>>
>>> We are doing the "adding a node then decommissioning a node" to
>>> achieve better availability. Replacing a node need to shut down one node
>>> first, if another node is down during the node replacement period, we will
>>> get availability drop because most of our use case is local_quorum with
>>> replication factor 3.
>>>
>>> On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
>>> user@cassandra.apache.org> wrote:
>>>
 Have you thought of using "-Dcassandra.replace_address_first_boot=..."
 (or "-Dcassandra.replace_address=..." if you are using an older version)?
 This will not result in a topology change, which means "nodetool cleanup"
 is not needed after the operation is completed.
 On 05/05/2023 05:24, Jaydeep Chovatia wrote:

 Thanks, Jeff!
 But in our environment we replace nodes quite often for various
 optimization purposes, etc. say, almost 1 node per day (node *addition*
 followed by node *decommission*, which of course changes the
 topology), and we have a cluster of size 100 nodes with 300GB per node. If
 we have to run cleanup on 100 nodes after every replacement, then it could
 take forever.
 What is the recommendation until we get this fixed in Cassandra itself
 as part of compaction (w/o externally triggering *cleanup*)?

 Jaydeep

 On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:

> Cleanup is fast and cheap and basically a no-op if you haven’t changed
> the ring
>
> After cassandra has transactional cluster metadata to make ring
> changes strongly consistent, cassandra should do this in every compaction.
> But until then it’s left for operators to run when they’re sure the state
> of the ring is correct .
>
>
>
> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
> 
> Isn't this considered a kind of *bug* in Cassandra because as we know
> *cleanup* is a lengthy and unreliable operation, so relying on the
> *cleanup* means higher chances of data resurrection?
> Do you think we should discard the unowned token-ranges as part of the
> regular compaction itself? What are the pitfalls of doing this as part of
> compaction itself?
>
> Jaydeep
>
> On Thu, May 4, 

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jeff Jirsa
Lots of caveats on these suggestions, let me try to hit most of them.

Cleanup in parallel is good and fine and common. Limit number of threads in
cleanup if you're using lots of vnodes, so each node runs one at a time and
not all nodes use all your cores at the same time.
If a host is fully offline, you can ALSO use replace address first boot.
It'll stream data right to that host with the same token assignments you
had before, and no cleanup is needed then. Strictly speaking, to avoid
resurrection here, you'd want to run repair on the replicas of the down
host (for vnodes, probably the whole cluster), but your current process
doesnt guarantee that either (decom + bootstrap may resurrect, strictly
speaking).
Dropping vnodes will reduce the replicas that have to be cleaned up, but
also potentially increase your imbalance on each replacement.

Cassandra should still do this on its own, and I think once CEP-21 is
committed, this should be one of the first enhancement tickets.

Until then, LeveledCompactionStrategy really does make cleanup fast and
cheap, at the cost of higher IO the rest of the time. If you can tolerate
that higher IO, you'll probably appreciate LCS anyway (faster reads, faster
data deletion than STCS). It's a lot of IO compared to STCS though.



On Fri, May 5, 2023 at 9:02 AM Jaydeep Chovatia 
wrote:

> Thanks all for your valuable inputs. We will try some of the suggested
> methods in this thread, and see how it goes. We will keep you updated on
> our progress.
> Thanks a lot once again!
>
> Jaydeep
>
> On Fri, May 5, 2023 at 8:55 AM Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> Depending on the number of vnodes per server, the probability and
>> severity (i.e. the size of the affected token ranges) of an availability
>> degradation due to a server failure during node replacement may be small.
>> You also have the choice of increasing the RF if that's still not
>> acceptable.
>>
>> Also, reducing number of vnodes per server can limit the number of
>> servers affected by replacing a single server, therefore reducing the
>> amount of time required to run "nodetool cleanup" if it is run sequentially.
>>
>> Finally, you may choose to run "nodetool cleanup" concurrently on
>> multiple nodes to reduce the amount of time required to complete it.
>>
>>
>> On 05/05/2023 16:26, Runtian Liu wrote:
>>
>> We are doing the "adding a node then decommissioning a node" to
>> achieve better availability. Replacing a node need to shut down one node
>> first, if another node is down during the node replacement period, we will
>> get availability drop because most of our use case is local_quorum with
>> replication factor 3.
>>
>> On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
>> user@cassandra.apache.org> wrote:
>>
>>> Have you thought of using "-Dcassandra.replace_address_first_boot=..."
>>> (or "-Dcassandra.replace_address=..." if you are using an older version)?
>>> This will not result in a topology change, which means "nodetool cleanup"
>>> is not needed after the operation is completed.
>>> On 05/05/2023 05:24, Jaydeep Chovatia wrote:
>>>
>>> Thanks, Jeff!
>>> But in our environment we replace nodes quite often for various
>>> optimization purposes, etc. say, almost 1 node per day (node *addition*
>>> followed by node *decommission*, which of course changes the topology),
>>> and we have a cluster of size 100 nodes with 300GB per node. If we have to
>>> run cleanup on 100 nodes after every replacement, then it could take
>>> forever.
>>> What is the recommendation until we get this fixed in Cassandra itself
>>> as part of compaction (w/o externally triggering *cleanup*)?
>>>
>>> Jaydeep
>>>
>>> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>>>
 Cleanup is fast and cheap and basically a no-op if you haven’t changed
 the ring

 After cassandra has transactional cluster metadata to make ring changes
 strongly consistent, cassandra should do this in every compaction. But
 until then it’s left for operators to run when they’re sure the state of
 the ring is correct .



 On May 4, 2023, at 7:41 PM, Jaydeep Chovatia <
 chovatia.jayd...@gmail.com> wrote:

 
 Isn't this considered a kind of *bug* in Cassandra because as we know
 *cleanup* is a lengthy and unreliable operation, so relying on the
 *cleanup* means higher chances of data resurrection?
 Do you think we should discard the unowned token-ranges as part of the
 regular compaction itself? What are the pitfalls of doing this as part of
 compaction itself?

 Jaydeep

 On Thu, May 4, 2023 at 7:25 PM guo Maxwell 
 wrote:

> compact ion will just merge duplicate data and remove delete data in
> this node .if you add or remove one node for the cluster, I think clean up
> is needed. if clean up failed, I think we should come to see the reason.
>
> Runtian Liu  于2023年5月5日周五 06:37写道:
>
>> Hi all,
>>

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jaydeep Chovatia
Thanks all for your valuable inputs. We will try some of the suggested
methods in this thread, and see how it goes. We will keep you updated on
our progress.
Thanks a lot once again!

Jaydeep

On Fri, May 5, 2023 at 8:55 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Depending on the number of vnodes per server, the probability and severity
> (i.e. the size of the affected token ranges) of an availability degradation
> due to a server failure during node replacement may be small. You also have
> the choice of increasing the RF if that's still not acceptable.
>
> Also, reducing number of vnodes per server can limit the number of servers
> affected by replacing a single server, therefore reducing the amount of
> time required to run "nodetool cleanup" if it is run sequentially.
>
> Finally, you may choose to run "nodetool cleanup" concurrently on multiple
> nodes to reduce the amount of time required to complete it.
>
>
> On 05/05/2023 16:26, Runtian Liu wrote:
>
> We are doing the "adding a node then decommissioning a node" to
> achieve better availability. Replacing a node need to shut down one node
> first, if another node is down during the node replacement period, we will
> get availability drop because most of our use case is local_quorum with
> replication factor 3.
>
> On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> Have you thought of using "-Dcassandra.replace_address_first_boot=..."
>> (or "-Dcassandra.replace_address=..." if you are using an older version)?
>> This will not result in a topology change, which means "nodetool cleanup"
>> is not needed after the operation is completed.
>> On 05/05/2023 05:24, Jaydeep Chovatia wrote:
>>
>> Thanks, Jeff!
>> But in our environment we replace nodes quite often for various
>> optimization purposes, etc. say, almost 1 node per day (node *addition*
>> followed by node *decommission*, which of course changes the topology),
>> and we have a cluster of size 100 nodes with 300GB per node. If we have to
>> run cleanup on 100 nodes after every replacement, then it could take
>> forever.
>> What is the recommendation until we get this fixed in Cassandra itself as
>> part of compaction (w/o externally triggering *cleanup*)?
>>
>> Jaydeep
>>
>> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>>
>>> Cleanup is fast and cheap and basically a no-op if you haven’t changed
>>> the ring
>>>
>>> After cassandra has transactional cluster metadata to make ring changes
>>> strongly consistent, cassandra should do this in every compaction. But
>>> until then it’s left for operators to run when they’re sure the state of
>>> the ring is correct .
>>>
>>>
>>>
>>> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
>>> wrote:
>>>
>>> 
>>> Isn't this considered a kind of *bug* in Cassandra because as we know
>>> *cleanup* is a lengthy and unreliable operation, so relying on the
>>> *cleanup* means higher chances of data resurrection?
>>> Do you think we should discard the unowned token-ranges as part of the
>>> regular compaction itself? What are the pitfalls of doing this as part of
>>> compaction itself?
>>>
>>> Jaydeep
>>>
>>> On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:
>>>
 compact ion will just merge duplicate data and remove delete data in
 this node .if you add or remove one node for the cluster, I think clean up
 is needed. if clean up failed, I think we should come to see the reason.

 Runtian Liu  于2023年5月5日周五 06:37写道:

> Hi all,
>
> Is cleanup the sole method to remove data that does not belong to a
> specific node? In a cluster, where nodes are added or decommissioned from
> time to time, failure to run cleanup may lead to data resurrection issues,
> as deleted data may remain on the node that lost ownership of certain
> partitions. Or is it true that normal compactions can also handle data
> removal for nodes that no longer have ownership of certain data?
>
> Thanks,
> Runtian
>


 --
 you are the apple of my eye !

>>>


Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Bowen Song via user
Depending on the number of vnodes per server, the probability and 
severity (i.e. the size of the affected token ranges) of an availability 
degradation due to a server failure during node replacement may be 
small. You also have the choice of increasing the RF if that's still not 
acceptable.


Also, reducing number of vnodes per server can limit the number of 
servers affected by replacing a single server, therefore reducing the 
amount of time required to run "nodetool cleanup" if it is run sequentially.


Finally, you may choose to run "nodetool cleanup" concurrently on 
multiple nodes to reduce the amount of time required to complete it.



On 05/05/2023 16:26, Runtian Liu wrote:
We are doing the "adding a node then decommissioning a node" to 
achieve better availability. Replacing a node need to shut down one 
node first, if another node is down during the node replacement 
period, we will get availability drop because most of our use case is 
local_quorum with replication factor 3.


On Fri, May 5, 2023 at 5:59 AM Bowen Song via user 
 wrote:


Have you thought of using
"-Dcassandra.replace_address_first_boot=..." (or
"-Dcassandra.replace_address=..." if you are using an older
version)? This will not result in a topology change, which means
"nodetool cleanup" is not needed after the operation is completed.

On 05/05/2023 05:24, Jaydeep Chovatia wrote:

Thanks, Jeff!
But in our environment we replace nodes quite often for various
optimization purposes, etc. say, almost 1 node per day (node
/addition/ followed by node /decommission/, which of course
changes the topology), and we have a cluster of size 100 nodes
with 300GB per node. If we have to run cleanup on 100 nodes after
every replacement, then it could take forever.
What is the recommendation until we get this fixed in Cassandra
itself as part of compaction (w/o externally triggering /cleanup/)?

Jaydeep

On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:

Cleanup is fast and cheap and basically a no-op if you
haven’t changed the ring

After cassandra has transactional cluster metadata to make
ring changes strongly consistent, cassandra should do this in
every compaction. But until then it’s left for operators to
run when they’re sure the state of the ring is correct .




On May 4, 2023, at 7:41 PM, Jaydeep Chovatia
 wrote:


Isn't this considered a kind of *bug* in Cassandra because
as we know /cleanup/ is a lengthy and unreliable operation,
so relying on the /cleanup/ means higher chances of data
resurrection?
Do you think we should discard the unowned token-ranges as
part of the regular compaction itself? What are the pitfalls
of doing this as part of compaction itself?

Jaydeep

On Thu, May 4, 2023 at 7:25 PM guo Maxwell
 wrote:

compact ion will just merge duplicate data and remove
delete data in this node .if you add or remove one node
for the cluster, I think clean up is needed. if clean up
failed, I think we should come to see the reason.

Runtian Liu  于2023年5月5日周五
06:37写道:

Hi all,

Is cleanup the sole method to remove data that does
not belong to a specific node? In a cluster, where
nodes are added or decommissioned from time to time,
failure to run cleanup may lead to data resurrection
issues, as deleted data may remain on the node that
lost ownership of certain partitions. Or is it true
that normal compactions can also handle data removal
for nodes that no longer have ownership of certain data?

Thanks,
Runtian



-- 
you are the apple of my eye !


Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Runtian Liu
We are doing the "adding a node then decommissioning a node" to
achieve better availability. Replacing a node need to shut down one node
first, if another node is down during the node replacement period, we will
get availability drop because most of our use case is local_quorum with
replication factor 3.

On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Have you thought of using "-Dcassandra.replace_address_first_boot=..." (or
> "-Dcassandra.replace_address=..." if you are using an older version)? This
> will not result in a topology change, which means "nodetool cleanup" is not
> needed after the operation is completed.
> On 05/05/2023 05:24, Jaydeep Chovatia wrote:
>
> Thanks, Jeff!
> But in our environment we replace nodes quite often for various
> optimization purposes, etc. say, almost 1 node per day (node *addition*
> followed by node *decommission*, which of course changes the topology),
> and we have a cluster of size 100 nodes with 300GB per node. If we have to
> run cleanup on 100 nodes after every replacement, then it could take
> forever.
> What is the recommendation until we get this fixed in Cassandra itself as
> part of compaction (w/o externally triggering *cleanup*)?
>
> Jaydeep
>
> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>
>> Cleanup is fast and cheap and basically a no-op if you haven’t changed
>> the ring
>>
>> After cassandra has transactional cluster metadata to make ring changes
>> strongly consistent, cassandra should do this in every compaction. But
>> until then it’s left for operators to run when they’re sure the state of
>> the ring is correct .
>>
>>
>>
>> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
>> wrote:
>>
>> 
>> Isn't this considered a kind of *bug* in Cassandra because as we know
>> *cleanup* is a lengthy and unreliable operation, so relying on the
>> *cleanup* means higher chances of data resurrection?
>> Do you think we should discard the unowned token-ranges as part of the
>> regular compaction itself? What are the pitfalls of doing this as part of
>> compaction itself?
>>
>> Jaydeep
>>
>> On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:
>>
>>> compact ion will just merge duplicate data and remove delete data in
>>> this node .if you add or remove one node for the cluster, I think clean up
>>> is needed. if clean up failed, I think we should come to see the reason.
>>>
>>> Runtian Liu  于2023年5月5日周五 06:37写道:
>>>
 Hi all,

 Is cleanup the sole method to remove data that does not belong to a
 specific node? In a cluster, where nodes are added or decommissioned from
 time to time, failure to run cleanup may lead to data resurrection issues,
 as deleted data may remain on the node that lost ownership of certain
 partitions. Or is it true that normal compactions can also handle data
 removal for nodes that no longer have ownership of certain data?

 Thanks,
 Runtian

>>>
>>>
>>> --
>>> you are the apple of my eye !
>>>
>>


Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Bowen Song via user
Have you thought of using "-Dcassandra.replace_address_first_boot=..." 
(or "-Dcassandra.replace_address=..." if you are using an older 
version)? This will not result in a topology change, which means 
"nodetool cleanup" is not needed after the operation is completed.


On 05/05/2023 05:24, Jaydeep Chovatia wrote:

Thanks, Jeff!
But in our environment we replace nodes quite often for various 
optimization purposes, etc. say, almost 1 node per day (node 
/addition/ followed by node /decommission/, which of course changes 
the topology), and we have a cluster of size 100 nodes with 300GB per 
node. If we have to run cleanup on 100 nodes after every replacement, 
then it could take forever.
What is the recommendation until we get this fixed in Cassandra itself 
as part of compaction (w/o externally triggering /cleanup/)?


Jaydeep

On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:

Cleanup is fast and cheap and basically a no-op if you haven’t
changed the ring

After cassandra has transactional cluster metadata to make ring
changes strongly consistent, cassandra should do this in every
compaction. But until then it’s left for operators to run when
they’re sure the state of the ring is correct .




On May 4, 2023, at 7:41 PM, Jaydeep Chovatia
 wrote:


Isn't this considered a kind of *bug* in Cassandra because as we
know /cleanup/ is a lengthy and unreliable operation, so relying
on the /cleanup/ means higher chances of data resurrection?
Do you think we should discard the unowned token-ranges as part
of the regular compaction itself? What are the pitfalls of doing
this as part of compaction itself?

Jaydeep

On Thu, May 4, 2023 at 7:25 PM guo Maxwell 
wrote:

compact ion will just merge duplicate data and remove delete
data in this node .if you add or remove one node for the
cluster, I think clean up is needed. if clean up failed, I
think we should come to see the reason.

Runtian Liu  于2023年5月5日周五 06:37写道:

Hi all,

Is cleanup the sole method to remove data that does not
belong to a specific node? In a cluster, where nodes are
added or decommissioned from time to time, failure to run
cleanup may lead to data resurrection issues, as deleted
data may remain on the node that lost ownership of
certain partitions. Or is it true that normal compactions
can also handle data removal for nodes that no longer
have ownership of certain data?

Thanks,
Runtian



-- 
you are the apple of my eye !


RE: Is cleanup is required if cluster topology changes

2023-05-05 Thread Durity, Sean R via user
I run clean-up in parallel, not serially, since it is a node-only kind of 
operation. And I only run in the impacted DC. With only 300 GB on a node, 
clean-up should not take very long. Check your compactionthroughput.

I ran clean-up in parallel on 53 nodes with over 3 TB of data each. It took 
like 6-8 hours. (And many nodes were done much earlier than that.) I restrict 
clean-up to one compactionthread, but I double the compactionthroughput for the 
duration of the cleanup. This protects against two large sstables being 
compacted at the same time and running out of disk space.

Sean Durity
From: manish khandelwal 
Sent: Friday, May 5, 2023 4:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Is cleanup is required if cluster topology changes

You can replace the node directly why to add a node and decommission the 
another node. Just replace the node with the new node and your topology remains 
the same so no need to run the cleanup . On Fri, May 5, 2023 at 10: 26 AM 
Jaydeep Chovatia

You can replace the node directly why to add a node and decommission the 
another node. Just replace the node with the new node and your topology remains 
the same so no need to run the cleanup .

On Fri, May 5, 2023 at 10:26 AM Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

We use STCS, and our experience with cleanup is that it takes a long time to 
run in a 100-node cluster. We would like to replace one node every day for 
various purposes in our fleet.

If we run cleanup after each node replacement, then it might take, say, 15 days 
to complete, and that hinders our node replacement frequency.

Do you see any other options?

Jaydeep

On Thu, May 4, 2023 at 9:47 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
You should 100% trigger cleanup each time or you’ll almost certainly resurrect 
data sooner or later
If you’re using leveled compaction it’s especially cheap. Stcs and twcs are 
worse, but if you’re really scaling that often, I’d be considering lcs and 
running cleanup just before or just after each scaling


On May 4, 2023, at 9:25 PM, Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

Thanks, Jeff!
But in our environment we replace nodes quite often for various optimization 
purposes, etc. say, almost 1 node per day (node addition followed by node 
decommission, which of course changes the topology), and we have a cluster of 
size 100 nodes with 300GB per node. If we have to run cleanup on 100 nodes 
after every replacement, then it could take forever.
What is the recommendation until we get this fixed in Cassandra itself as part 
of compaction (w/o externally triggering cleanup)?

Jaydeep

On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring
After cassandra has transactional cluster metadata to make ring changes 
strongly consistent, cassandra should do this in every compaction. But until 
then it’s left for operators to run when they’re sure the state of the ring is 
correct .




On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:

Isn't this considered a kind of bug in Cassandra because as we know cleanup is 
a lengthy and unreliable operation, so relying on the cleanup means higher 
chances of data resurrection?
Do you think we should discard the unowned token-ranges as part of the regular 
compaction itself? What are the pitfalls of doing this as part of compaction 
itself?

Jaydeep

On Thu, May 4, 2023 at 7:25 PM guo Maxwell 
mailto:cclive1...@gmail.com>> wrote:
compact ion will just merge duplicate data and remove delete data in this node 
.if you add or remove one node for the cluster, I think clean up is needed. if 
clean up failed, I think we should come to see the reason.

Runtian Liu mailto:curly...@gmail.com>> 于2023年5月5日周五 
06:37写道:
Hi all,

Is cleanup the sole method to remove data that does not belong to a specific 
node? In a cluster, where nodes are added or decommissioned from time to time, 
failure to run cleanup may lead to data resurrection issues, as deleted data 
may remain on the node that lost ownership of certain partitions. Or is it true 
that normal compactions can also handle data removal for nodes that no longer 
have ownership of certain data?

Thanks,
Runtian


--
you are the apple of my eye !



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home 

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread manish khandelwal
You can replace the node directly why to add a node and decommission the
another node. Just replace the node with the new node and your topology
remains the same so no need to run the cleanup .

On Fri, May 5, 2023 at 10:26 AM Jaydeep Chovatia 
wrote:

> We use STCS, and our experience with *cleanup* is that it takes a long
> time to run in a 100-node cluster. We would like to replace one node every
> day for various purposes in our fleet.
>
> If we run *cleanup* after each node replacement, then it might take, say,
> 15 days to complete, and that hinders our node replacement frequency.
>
> Do you see any other options?
>
> Jaydeep
>
> On Thu, May 4, 2023 at 9:47 PM Jeff Jirsa  wrote:
>
>> You should 100% trigger cleanup each time or you’ll almost certainly
>> resurrect data sooner or later
>>
>> If you’re using leveled compaction it’s especially cheap. Stcs and twcs
>> are worse, but if you’re really scaling that often, I’d be considering lcs
>> and running cleanup just before or just after each scaling
>>
>> On May 4, 2023, at 9:25 PM, Jaydeep Chovatia 
>> wrote:
>>
>> 
>> Thanks, Jeff!
>> But in our environment we replace nodes quite often for various
>> optimization purposes, etc. say, almost 1 node per day (node *addition*
>> followed by node *decommission*, which of course changes the topology),
>> and we have a cluster of size 100 nodes with 300GB per node. If we have to
>> run cleanup on 100 nodes after every replacement, then it could take
>> forever.
>> What is the recommendation until we get this fixed in Cassandra itself as
>> part of compaction (w/o externally triggering *cleanup*)?
>>
>> Jaydeep
>>
>> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>>
>>> Cleanup is fast and cheap and basically a no-op if you haven’t changed
>>> the ring
>>>
>>> After cassandra has transactional cluster metadata to make ring changes
>>> strongly consistent, cassandra should do this in every compaction. But
>>> until then it’s left for operators to run when they’re sure the state of
>>> the ring is correct .
>>>
>>>
>>>
>>> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
>>> wrote:
>>>
>>> 
>>> Isn't this considered a kind of *bug* in Cassandra because as we know
>>> *cleanup* is a lengthy and unreliable operation, so relying on the
>>> *cleanup* means higher chances of data resurrection?
>>> Do you think we should discard the unowned token-ranges as part of the
>>> regular compaction itself? What are the pitfalls of doing this as part of
>>> compaction itself?
>>>
>>> Jaydeep
>>>
>>> On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:
>>>
 compact ion will just merge duplicate data and remove delete data in
 this node .if you add or remove one node for the cluster, I think clean up
 is needed. if clean up failed, I think we should come to see the reason.

 Runtian Liu  于2023年5月5日周五 06:37写道:

> Hi all,
>
> Is cleanup the sole method to remove data that does not belong to a
> specific node? In a cluster, where nodes are added or decommissioned from
> time to time, failure to run cleanup may lead to data resurrection issues,
> as deleted data may remain on the node that lost ownership of certain
> partitions. Or is it true that normal compactions can also handle data
> removal for nodes that no longer have ownership of certain data?
>
> Thanks,
> Runtian
>


 --
 you are the apple of my eye !

>>>


Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jaydeep Chovatia
We use STCS, and our experience with *cleanup* is that it takes a long time
to run in a 100-node cluster. We would like to replace one node every day
for various purposes in our fleet.

If we run *cleanup* after each node replacement, then it might take, say,
15 days to complete, and that hinders our node replacement frequency.

Do you see any other options?

Jaydeep

On Thu, May 4, 2023 at 9:47 PM Jeff Jirsa  wrote:

> You should 100% trigger cleanup each time or you’ll almost certainly
> resurrect data sooner or later
>
> If you’re using leveled compaction it’s especially cheap. Stcs and twcs
> are worse, but if you’re really scaling that often, I’d be considering lcs
> and running cleanup just before or just after each scaling
>
> On May 4, 2023, at 9:25 PM, Jaydeep Chovatia 
> wrote:
>
> 
> Thanks, Jeff!
> But in our environment we replace nodes quite often for various
> optimization purposes, etc. say, almost 1 node per day (node *addition*
> followed by node *decommission*, which of course changes the topology),
> and we have a cluster of size 100 nodes with 300GB per node. If we have to
> run cleanup on 100 nodes after every replacement, then it could take
> forever.
> What is the recommendation until we get this fixed in Cassandra itself as
> part of compaction (w/o externally triggering *cleanup*)?
>
> Jaydeep
>
> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>
>> Cleanup is fast and cheap and basically a no-op if you haven’t changed
>> the ring
>>
>> After cassandra has transactional cluster metadata to make ring changes
>> strongly consistent, cassandra should do this in every compaction. But
>> until then it’s left for operators to run when they’re sure the state of
>> the ring is correct .
>>
>>
>>
>> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
>> wrote:
>>
>> 
>> Isn't this considered a kind of *bug* in Cassandra because as we know
>> *cleanup* is a lengthy and unreliable operation, so relying on the
>> *cleanup* means higher chances of data resurrection?
>> Do you think we should discard the unowned token-ranges as part of the
>> regular compaction itself? What are the pitfalls of doing this as part of
>> compaction itself?
>>
>> Jaydeep
>>
>> On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:
>>
>>> compact ion will just merge duplicate data and remove delete data in
>>> this node .if you add or remove one node for the cluster, I think clean up
>>> is needed. if clean up failed, I think we should come to see the reason.
>>>
>>> Runtian Liu  于2023年5月5日周五 06:37写道:
>>>
 Hi all,

 Is cleanup the sole method to remove data that does not belong to a
 specific node? In a cluster, where nodes are added or decommissioned from
 time to time, failure to run cleanup may lead to data resurrection issues,
 as deleted data may remain on the node that lost ownership of certain
 partitions. Or is it true that normal compactions can also handle data
 removal for nodes that no longer have ownership of certain data?

 Thanks,
 Runtian

>>>
>>>
>>> --
>>> you are the apple of my eye !
>>>
>>


Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
You should 100% trigger cleanup each time or you’ll almost certainly resurrect data sooner or laterIf you’re using leveled compaction it’s especially cheap. Stcs and twcs are worse, but if you’re really scaling that often, I’d be considering lcs and running cleanup just before or just after each scaling On May 4, 2023, at 9:25 PM, Jaydeep Chovatia  wrote:Thanks, Jeff!But in our environment we replace nodes quite often for various optimization purposes, etc. say, almost 1 node per day (node addition followed by node decommission, which of course changes the topology), and we have a cluster of size 100 nodes with 300GB per node. If we have to run cleanup on 100 nodes after every replacement, then it could take forever. What is the recommendation until we get this fixed in Cassandra itself as part of compaction (w/o externally triggering cleanup)?JaydeepOn Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring After cassandra has transactional cluster metadata to make ring changes strongly consistent, cassandra should do this in every compaction. But until then it’s left for operators to run when they’re sure the state of the ring is correct .On May 4, 2023, at 7:41 PM, Jaydeep Chovatia  wrote:Isn't this considered a kind of bug in Cassandra because as we know cleanup is a lengthy and unreliable operation, so relying on the cleanup means higher chances of data resurrection?Do you think we should discard the unowned token-ranges as part of the regular compaction itself? What are the pitfalls of doing this as part of compaction itself?JaydeepOn Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:compact ion will just merge duplicate data and remove delete data in this node .if you add or remove one node for the cluster, I think clean up is needed. if clean up failed, I think we should come to see the reason.Runtian Liu  于2023年5月5日周五 06:37写道:Hi all,Is cleanup the sole method to remove data that does not belong to a specific node? In a cluster, where nodes are added or decommissioned from time to time, failure to run cleanup may lead to data resurrection issues, as deleted data may remain on the node that lost ownership of certain partitions. Or is it true that normal compactions can also handle data removal for nodes that no longer have ownership of certain data?Thanks,Runtian
-- you are the apple of my eye !




Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jaydeep Chovatia
Thanks, Jeff!
But in our environment we replace nodes quite often for various
optimization purposes, etc. say, almost 1 node per day (node *addition*
followed by node *decommission*, which of course changes the topology), and
we have a cluster of size 100 nodes with 300GB per node. If we have to run
cleanup on 100 nodes after every replacement, then it could take forever.
What is the recommendation until we get this fixed in Cassandra itself as
part of compaction (w/o externally triggering *cleanup*)?

Jaydeep

On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:

> Cleanup is fast and cheap and basically a no-op if you haven’t changed the
> ring
>
> After cassandra has transactional cluster metadata to make ring changes
> strongly consistent, cassandra should do this in every compaction. But
> until then it’s left for operators to run when they’re sure the state of
> the ring is correct .
>
>
>
> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
> wrote:
>
> 
> Isn't this considered a kind of *bug* in Cassandra because as we know
> *cleanup* is a lengthy and unreliable operation, so relying on the
> *cleanup* means higher chances of data resurrection?
> Do you think we should discard the unowned token-ranges as part of the
> regular compaction itself? What are the pitfalls of doing this as part of
> compaction itself?
>
> Jaydeep
>
> On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:
>
>> compact ion will just merge duplicate data and remove delete data in this
>> node .if you add or remove one node for the cluster, I think clean up is
>> needed. if clean up failed, I think we should come to see the reason.
>>
>> Runtian Liu  于2023年5月5日周五 06:37写道:
>>
>>> Hi all,
>>>
>>> Is cleanup the sole method to remove data that does not belong to a
>>> specific node? In a cluster, where nodes are added or decommissioned from
>>> time to time, failure to run cleanup may lead to data resurrection issues,
>>> as deleted data may remain on the node that lost ownership of certain
>>> partitions. Or is it true that normal compactions can also handle data
>>> removal for nodes that no longer have ownership of certain data?
>>>
>>> Thanks,
>>> Runtian
>>>
>>
>>
>> --
>> you are the apple of my eye !
>>
>


Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring After cassandra has transactional cluster metadata to make ring changes strongly consistent, cassandra should do this in every compaction. But until then it’s left for operators to run when they’re sure the state of the ring is correct .On May 4, 2023, at 7:41 PM, Jaydeep Chovatia  wrote:Isn't this considered a kind of bug in Cassandra because as we know cleanup is a lengthy and unreliable operation, so relying on the cleanup means higher chances of data resurrection?Do you think we should discard the unowned token-ranges as part of the regular compaction itself? What are the pitfalls of doing this as part of compaction itself?JaydeepOn Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:compact ion will just merge duplicate data and remove delete data in this node .if you add or remove one node for the cluster, I think clean up is needed. if clean up failed, I think we should come to see the reason.Runtian Liu  于2023年5月5日周五 06:37写道:Hi all,Is cleanup the sole method to remove data that does not belong to a specific node? In a cluster, where nodes are added or decommissioned from time to time, failure to run cleanup may lead to data resurrection issues, as deleted data may remain on the node that lost ownership of certain partitions. Or is it true that normal compactions can also handle data removal for nodes that no longer have ownership of certain data?Thanks,Runtian
-- you are the apple of my eye !



Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jaydeep Chovatia
Isn't this considered a kind of *bug* in Cassandra because as we know
*cleanup* is a lengthy and unreliable operation, so relying on the *cleanup*
means higher chances of data resurrection?
Do you think we should discard the unowned token-ranges as part of the
regular compaction itself? What are the pitfalls of doing this as part of
compaction itself?

Jaydeep

On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:

> compact ion will just merge duplicate data and remove delete data in this
> node .if you add or remove one node for the cluster, I think clean up is
> needed. if clean up failed, I think we should come to see the reason.
>
> Runtian Liu  于2023年5月5日周五 06:37写道:
>
>> Hi all,
>>
>> Is cleanup the sole method to remove data that does not belong to a
>> specific node? In a cluster, where nodes are added or decommissioned from
>> time to time, failure to run cleanup may lead to data resurrection issues,
>> as deleted data may remain on the node that lost ownership of certain
>> partitions. Or is it true that normal compactions can also handle data
>> removal for nodes that no longer have ownership of certain data?
>>
>> Thanks,
>> Runtian
>>
>
>
> --
> you are the apple of my eye !
>


Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread guo Maxwell
compact ion will just merge duplicate data and remove delete data in this
node .if you add or remove one node for the cluster, I think clean up is
needed. if clean up failed, I think we should come to see the reason.

Runtian Liu  于2023年5月5日周五 06:37写道:

> Hi all,
>
> Is cleanup the sole method to remove data that does not belong to a
> specific node? In a cluster, where nodes are added or decommissioned from
> time to time, failure to run cleanup may lead to data resurrection issues,
> as deleted data may remain on the node that lost ownership of certain
> partitions. Or is it true that normal compactions can also handle data
> removal for nodes that no longer have ownership of certain data?
>
> Thanks,
> Runtian
>


-- 
you are the apple of my eye !