Questions regarding Cassandra 4 and Cassandra 4.1

2023-11-02 Thread Runtian Liu
Hi all,

Earlier this year, we upgraded our fleet from C* 3.0 to C* 4.0. Given the
exciting new features in C* 4.1, we are contemplating an upgrade from C*
4.0 to C* 4.1. Can anyone share their experience regarding the stability of
C* 4.1? Are any of you running C* 4.1 at scale?

Additionally, I have a query about repair procedures. Due to the known
instability of incremental repair in C* 3.0, we've consistently opted for
full repairs on all our clusters. With the advancements in C* 4.0 regarding
incremental repair, has its stability improved? Which repair method are you
currently using: full or incremental?

Thanks,
Runtian


4.0 upgrade

2023-07-07 Thread Runtian Liu
Hi,

We are upgrading our Cassandra clusters from 3.0.27 to 4.0.6 and we
observed some error related to repair:  j.l.IllegalArgumentException:
Unknown verb id 32

We have two datacenters for each Cassandra cluster and when we are doing an
upgrade, we want to upgrade 1 datacenter first and monitor the upgrade
datacenter for some time (1 week) to make sure there is no issue, then we
will upgrade the second datacenter for that cluster.

We have some automated repair jobs running, is it expected to have repair
stuck if we have 1 datacenter on 4.0 and 1 datacenter on 3.0?

Do you have any suggestions on how we should do the upgrade, is waiting for
1 week between two datacenters too long?

Thanks,
Runtian


Re: Replacing node without shutting down the old node

2023-05-16 Thread Runtian Liu
cool, thank you. This looks like a very good setup for us and cleanup
should be very fast for this case.

On Tue, May 16, 2023 at 5:53 AM Jeff Jirsa  wrote:

>
> In-line
>
> On May 15, 2023, at 5:26 PM, Runtian Liu  wrote:
>
> 
> Hi Jeff,
>
> I tried the setup with vnode 16 and NetworkTopologyStrategy replication
> strategy with replication factor 3 with 3 racks in one cluster. When using
> the new node token as the old node token - 1
>
>
> I had said +1 but you’re right that it’s actually -1 , sorry about that.
> You want the new node to be lower than the existing host. The lower token
> will take most of the data.
>
> I see the new node is streaming from the old node only. And the decom
> phase of the old node is extremely fast. Does this mean the new node will
> only take data ownership from the old node?
>
>
> With exactly three racks, yes. With more racks or fewer racks, no.
>
> I also did some cleanups after replacing node with old token - 1 and the
> cleanup sstable count was not increasing. Looks like adding a node with
> old_token - 1 and decom the old node will not generate stale data on the
> rest of the cluster. Do you know if  there are any edge cases that in this
> replacement process can generate any stale data on other nodes of the
> cluster with the setup I mentioned?
>
>
> Should do exactly what you want. I’d still run cleanup but it should be a
> no-op.
>
>
> Thanks,
> Runtian
>
> On Mon, May 8, 2023 at 9:59 PM Runtian Liu  wrote:
>
>> I thought the joining node would not participate in quorum? How are we
>> counting things like how many replicas ACK a write when we are adding a new
>> node for expansion? The token ownership won't change until the new node is
>> fully joined right?
>>
>> On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa  wrote:
>>
>>> You can't have two nodes with the same token (in the current metadata
>>> implementation) - it causes problems counting things like how many replicas
>>> ACK a write, and what happens if the one you're replacing ACKs a write but
>>> the joining host doesn't? It's harder than it seems to maintain consistency
>>> guarantees in that model, because you have 2 nodes where either may end up
>>> becoming the sole true owner of the token, and you have to handle both
>>> cases where one of them fails.
>>>
>>> An easier option is to add it with new token set to old token +1 (as an
>>> expansion), then decom the leaving node (shrink). That'll minimize
>>> streaming when you decommission that node.
>>>
>>>
>>>
>>> On Mon, May 8, 2023 at 7:19 PM Runtian Liu  wrote:
>>>
>>>> Hi all,
>>>>
>>>> Sometimes we want to replace a node for various reasons, we can replace
>>>> a node by shutting down the old node and letting the new node stream data
>>>> from other replicas, but this approach may have availability issues or data
>>>> consistency issues if one more node in the same cluster went down. Why
>>>> Cassandra doesn't support replacing a node without shutting down the old
>>>> one? Can we treat the new node as normal node addition while having exactly
>>>> the same token ranges as the node to be replaced. After the new node's
>>>> joining process is complete, we just need to cut off the old node. With
>>>> this, we don't lose any availability and the token range is not moved so no
>>>> clean up is needed. Is there any downside of doing this?
>>>>
>>>> Thanks,
>>>> Runtian
>>>>
>>>


Re: Replacing node without shutting down the old node

2023-05-15 Thread Runtian Liu
Hi Jeff,

I tried the setup with vnode 16 and NetworkTopologyStrategy replication
strategy with replication factor 3 with 3 racks in one cluster. When using
the new node token as the old node token - 1, I see the new node is
streaming from the old node only. And the decom phase of the old node is
extremely fast. Does this mean the new node will only take data ownership
from the old node? I also did some cleanups after replacing node with old
token - 1 and the cleanup sstable count was not increasing. Looks like
adding a node with old_token - 1 and decom the old node will not generate
stale data on the rest of the cluster. Do you know if  there are any edge
cases that in this replacement process can generate any stale data on other
nodes of the cluster with the setup I mentioned?

Thanks,
Runtian

On Mon, May 8, 2023 at 9:59 PM Runtian Liu  wrote:

> I thought the joining node would not participate in quorum? How are we
> counting things like how many replicas ACK a write when we are adding a new
> node for expansion? The token ownership won't change until the new node is
> fully joined right?
>
> On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa  wrote:
>
>> You can't have two nodes with the same token (in the current metadata
>> implementation) - it causes problems counting things like how many replicas
>> ACK a write, and what happens if the one you're replacing ACKs a write but
>> the joining host doesn't? It's harder than it seems to maintain consistency
>> guarantees in that model, because you have 2 nodes where either may end up
>> becoming the sole true owner of the token, and you have to handle both
>> cases where one of them fails.
>>
>> An easier option is to add it with new token set to old token +1 (as an
>> expansion), then decom the leaving node (shrink). That'll minimize
>> streaming when you decommission that node.
>>
>>
>>
>> On Mon, May 8, 2023 at 7:19 PM Runtian Liu  wrote:
>>
>>> Hi all,
>>>
>>> Sometimes we want to replace a node for various reasons, we can replace
>>> a node by shutting down the old node and letting the new node stream data
>>> from other replicas, but this approach may have availability issues or data
>>> consistency issues if one more node in the same cluster went down. Why
>>> Cassandra doesn't support replacing a node without shutting down the old
>>> one? Can we treat the new node as normal node addition while having exactly
>>> the same token ranges as the node to be replaced. After the new node's
>>> joining process is complete, we just need to cut off the old node. With
>>> this, we don't lose any availability and the token range is not moved so no
>>> clean up is needed. Is there any downside of doing this?
>>>
>>> Thanks,
>>> Runtian
>>>
>>


Re: Replacing node without shutting down the old node

2023-05-08 Thread Runtian Liu
I thought the joining node would not participate in quorum? How are we
counting things like how many replicas ACK a write when we are adding a new
node for expansion? The token ownership won't change until the new node is
fully joined right?

On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa  wrote:

> You can't have two nodes with the same token (in the current metadata
> implementation) - it causes problems counting things like how many replicas
> ACK a write, and what happens if the one you're replacing ACKs a write but
> the joining host doesn't? It's harder than it seems to maintain consistency
> guarantees in that model, because you have 2 nodes where either may end up
> becoming the sole true owner of the token, and you have to handle both
> cases where one of them fails.
>
> An easier option is to add it with new token set to old token +1 (as an
> expansion), then decom the leaving node (shrink). That'll minimize
> streaming when you decommission that node.
>
>
>
> On Mon, May 8, 2023 at 7:19 PM Runtian Liu  wrote:
>
>> Hi all,
>>
>> Sometimes we want to replace a node for various reasons, we can replace a
>> node by shutting down the old node and letting the new node stream data
>> from other replicas, but this approach may have availability issues or data
>> consistency issues if one more node in the same cluster went down. Why
>> Cassandra doesn't support replacing a node without shutting down the old
>> one? Can we treat the new node as normal node addition while having exactly
>> the same token ranges as the node to be replaced. After the new node's
>> joining process is complete, we just need to cut off the old node. With
>> this, we don't lose any availability and the token range is not moved so no
>> clean up is needed. Is there any downside of doing this?
>>
>> Thanks,
>> Runtian
>>
>


Replacing node without shutting down the old node

2023-05-08 Thread Runtian Liu
Hi all,

Sometimes we want to replace a node for various reasons, we can replace a
node by shutting down the old node and letting the new node stream data
from other replicas, but this approach may have availability issues or data
consistency issues if one more node in the same cluster went down. Why
Cassandra doesn't support replacing a node without shutting down the old
one? Can we treat the new node as normal node addition while having exactly
the same token ranges as the node to be replaced. After the new node's
joining process is complete, we just need to cut off the old node. With
this, we don't lose any availability and the token range is not moved so no
clean up is needed. Is there any downside of doing this?

Thanks,
Runtian


Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Runtian Liu
We are doing the "adding a node then decommissioning a node" to
achieve better availability. Replacing a node need to shut down one node
first, if another node is down during the node replacement period, we will
get availability drop because most of our use case is local_quorum with
replication factor 3.

On Fri, May 5, 2023 at 5:59 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Have you thought of using "-Dcassandra.replace_address_first_boot=..." (or
> "-Dcassandra.replace_address=..." if you are using an older version)? This
> will not result in a topology change, which means "nodetool cleanup" is not
> needed after the operation is completed.
> On 05/05/2023 05:24, Jaydeep Chovatia wrote:
>
> Thanks, Jeff!
> But in our environment we replace nodes quite often for various
> optimization purposes, etc. say, almost 1 node per day (node *addition*
> followed by node *decommission*, which of course changes the topology),
> and we have a cluster of size 100 nodes with 300GB per node. If we have to
> run cleanup on 100 nodes after every replacement, then it could take
> forever.
> What is the recommendation until we get this fixed in Cassandra itself as
> part of compaction (w/o externally triggering *cleanup*)?
>
> Jaydeep
>
> On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa  wrote:
>
>> Cleanup is fast and cheap and basically a no-op if you haven’t changed
>> the ring
>>
>> After cassandra has transactional cluster metadata to make ring changes
>> strongly consistent, cassandra should do this in every compaction. But
>> until then it’s left for operators to run when they’re sure the state of
>> the ring is correct .
>>
>>
>>
>> On May 4, 2023, at 7:41 PM, Jaydeep Chovatia 
>> wrote:
>>
>> 
>> Isn't this considered a kind of *bug* in Cassandra because as we know
>> *cleanup* is a lengthy and unreliable operation, so relying on the
>> *cleanup* means higher chances of data resurrection?
>> Do you think we should discard the unowned token-ranges as part of the
>> regular compaction itself? What are the pitfalls of doing this as part of
>> compaction itself?
>>
>> Jaydeep
>>
>> On Thu, May 4, 2023 at 7:25 PM guo Maxwell  wrote:
>>
>>> compact ion will just merge duplicate data and remove delete data in
>>> this node .if you add or remove one node for the cluster, I think clean up
>>> is needed. if clean up failed, I think we should come to see the reason.
>>>
>>> Runtian Liu  于2023年5月5日周五 06:37写道:
>>>
>>>> Hi all,
>>>>
>>>> Is cleanup the sole method to remove data that does not belong to a
>>>> specific node? In a cluster, where nodes are added or decommissioned from
>>>> time to time, failure to run cleanup may lead to data resurrection issues,
>>>> as deleted data may remain on the node that lost ownership of certain
>>>> partitions. Or is it true that normal compactions can also handle data
>>>> removal for nodes that no longer have ownership of certain data?
>>>>
>>>> Thanks,
>>>> Runtian
>>>>
>>>
>>>
>>> --
>>> you are the apple of my eye !
>>>
>>


Is cleanup is required if cluster topology changes

2023-05-04 Thread Runtian Liu
Hi all,

Is cleanup the sole method to remove data that does not belong to a
specific node? In a cluster, where nodes are added or decommissioned from
time to time, failure to run cleanup may lead to data resurrection issues,
as deleted data may remain on the node that lost ownership of certain
partitions. Or is it true that normal compactions can also handle data
removal for nodes that no longer have ownership of certain data?

Thanks,
Runtian


Cassandra 3.0 upgrade

2022-06-13 Thread Runtian Liu
Hi,

I am running Cassandra version 3.0.14 at scale on thousands of nodes. I am
planning to do a minor version upgrade from 3.0.14 to 3.0.26 in a safe
manner. My eventual goal is to upgrade from 3.0.26 to a major release 4.0.

As you know, there are multiple minor releases between 3.0.14 and 3.0.26,
so I am planning to upgrade in 2-3 batches say 1) 3.0.14 → 3.0.16 2) 3.0.16
to 3.0.20 3) 3.0.20 → 3.0.26.

. Do you have suggestions or anything that I need to be aware of? Is there
any minor release between 3.0.14 and 3.0.26, which is not safe etc.?

Best regards.