Depending on the number of vnodes per server, the probability and severity (i.e. the size of the affected token ranges) of an availability degradation due to a server failure during node replacement may be small. You also have the choice of increasing the RF if that's still not acceptable.

Also, reducing number of vnodes per server can limit the number of servers affected by replacing a single server, therefore reducing the amount of time required to run "nodetool cleanup" if it is run sequentially.

Finally, you may choose to run "nodetool cleanup" concurrently on multiple nodes to reduce the amount of time required to complete it.


On 05/05/2023 16:26, Runtian Liu wrote:
We are doing the "adding a node then decommissioning a node" to achieve better availability. Replacing a node need to shut down one node first, if another node is down during the node replacement period, we will get availability drop because most of our use case is local_quorum with replication factor 3.

On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <user@cassandra.apache.org> wrote:

    Have you thought of using
    "-Dcassandra.replace_address_first_boot=..." (or
    "-Dcassandra.replace_address=..." if you are using an older
    version)? This will not result in a topology change, which means
    "nodetool cleanup" is not needed after the operation is completed.

    On 05/05/2023 05:24, Jaydeep Chovatia wrote:
    Thanks, Jeff!
    But in our environment we replace nodes quite often for various
    optimization purposes, etc. say, almost 1 node per day (node
    /addition/ followed by node /decommission/, which of course
    changes the topology), and we have a cluster of size 100 nodes
    with 300GB per node. If we have to run cleanup on 100 nodes after
    every replacement, then it could take forever.
    What is the recommendation until we get this fixed in Cassandra
    itself as part of compaction (w/o externally triggering /cleanup/)?

    Jaydeep

    On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa <jji...@gmail.com> wrote:

        Cleanup is fast and cheap and basically a no-op if you
        haven’t changed the ring

        After cassandra has transactional cluster metadata to make
        ring changes strongly consistent, cassandra should do this in
        every compaction. But until then it’s left for operators to
        run when they’re sure the state of the ring is correct .



        On May 4, 2023, at 7:41 PM, Jaydeep Chovatia
        <chovatia.jayd...@gmail.com> wrote:

        
        Isn't this considered a kind of *bug* in Cassandra because
        as we know /cleanup/ is a lengthy and unreliable operation,
        so relying on the /cleanup/ means higher chances of data
        resurrection?
        Do you think we should discard the unowned token-ranges as
        part of the regular compaction itself? What are the pitfalls
        of doing this as part of compaction itself?

        Jaydeep

        On Thu, May 4, 2023 at 7:25 PM guo Maxwell
        <cclive1...@gmail.com> wrote:

            compact ion will just merge duplicate data and remove
            delete data in this node .if you add or remove one node
            for the cluster, I think clean up is needed. if clean up
            failed, I think we should come to see the reason.

            Runtian Liu <curly...@gmail.com> 于2023年5月5日周五
            06:37写道:

                Hi all,

                Is cleanup the sole method to remove data that does
                not belong to a specific node? In a cluster, where
                nodes are added or decommissioned from time to time,
                failure to run cleanup may lead to data resurrection
                issues, as deleted data may remain on the node that
                lost ownership of certain partitions. Or is it true
                that normal compactions can also handle data removal
                for nodes that no longer have ownership of certain data?

                Thanks,
                Runtian



-- you are the apple of my eye !

Reply via email to