Re: Cassandra in Kubernetes: IP switch decommission issue

Inès Potier Fri, 10 Mar 2023 07:52:23 -0800

Thanks for your response!
Following your advice, I filed a Jira ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-18319


On Thu, Mar 9, 2023 at 11:16 AM Jeff Jirsa <jji...@gmail.com> wrote:

> I described something roughly similar to this a few years ago on the list.
> The specific chain you're describing isn't one I've thought about before,
> but if you open a JIRA for tracking and attribution, I'll ask some folks to
> take a peek at it.
>
>
>
> On Thu, Mar 9, 2023 at 10:57 AM Inès Potier <inesm.pot...@gmail.com>
> wrote:
>
>> Hi Cassandra community,
>>
>> Reaching out again in case anyone has recently faced the below issue.
>> Additional opinions on this would be super helpful for us.
>>
>> Thanks in advance,
>> Ines
>>
>> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier <inesm.pot...@gmail.com>
>> wrote:
>>
>>> Hi Cassandra community,
>>>
>>> We have recently encountered a recurring old IP reappearance issue while
>>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>>> We have not yet found other references to this issue online. We could
>>> really use some additional inputs/opinions, both on the problem itself and
>>> the fix we are currently considering.
>>>
>>> *Issue Description*
>>>
>>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We
>>> have noticed that this behavior, associated with a decommission operation,
>>> can get the cluster into an erroneous state.
>>>
>>> Consider the following situation: a Cassandra node node1 , with hostId1,
>>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>>> After a couple gossip iterations, all other nodes’ nodetool status output
>>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>>> entry.
>>>
>>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>>> not have a lot of data, and the decommission operation completes pretty
>>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s
>>> pod is deleted.
>>>
>>> After a minute delay, the cluster enters the erroneous state. An  old_IP
>>>  DN entry reappears in nodetool status, owning 20.5% of the token ring.
>>> No node owns this IP anymore and according to logs, old_IP is still
>>> associated with hostId1.
>>>
>>> *Issue Root Cause*
>>>
>>> By digging through Cassandra logs, and re-testing this scenario over and
>>> over again, we have reached the following conclusion:
>>>
>>>    - Other nodes will continue exchanging gossip about old_IP , even
>>>    after it becomes a fatClient.
>>>    - The fatClient timeout and subsequent quarantine does not stop
>>>    old_IP from reappearing in a node’s Gossip state, once its
>>>    quarantine is over. We believe that this is due to a misalignment on all
>>>    nodes’ old_IP expiration time.
>>>    - Once new_IP has left the cluster, and old_IP next gossip state
>>>    message is received by a node, StorageService will no longer face
>>>    collisions (or will, but with an even older IP) for hostId1 and its
>>>    corresponding tokens. As a result, old_IP will regain ownership of
>>>    20.5% of the token ring.
>>>
>>>
>>> *Proposed fix*
>>>
>>> Following the above investigation, we were thinking about implementing
>>> the following fix:
>>>
>>> When a node receives a gossip status change with STATE_LEFT for a
>>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>>> following criteria:
>>>
>>>    - endpointStateMap contains this endpoint
>>>    - The endpoint is not currently a token owner (
>>>    !tokenMetadata.isMember(endpoint))
>>>    - The endpoint’s hostId matches the hostId of new_IP
>>>    - The endpoint is older than leaving_IP (
>>>    Gossiper.instance.compareEndpointStartup)
>>>    - The endpoint’s token range (from endpointStateMap) intersects with
>>>    new_IP’s
>>>
>>> This modification’s intention is to force nodes to realign on old_IP 
>>> expiration,
>>> and expunge it from Gossip so it does not reappear after new_IP leaves
>>> the ring.
>>>
>>>
>>> Additional opinions/ideas regarding the fix’s viability and the issue
>>> itself would be really helpful.
>>> Thanks in advance,
>>> Ines
>>>
>>

Re: Cassandra in Kubernetes: IP switch decommission issue

Reply via email to