Thanks for your response! Following your advice, I filed a Jira ticket here: https://issues.apache.org/jira/browse/CASSANDRA-18319
On Thu, Mar 9, 2023 at 11:16 AM Jeff Jirsa <jji...@gmail.com> wrote: > I described something roughly similar to this a few years ago on the list. > The specific chain you're describing isn't one I've thought about before, > but if you open a JIRA for tracking and attribution, I'll ask some folks to > take a peek at it. > > > > On Thu, Mar 9, 2023 at 10:57 AM Inès Potier <inesm.pot...@gmail.com> > wrote: > >> Hi Cassandra community, >> >> Reaching out again in case anyone has recently faced the below issue. >> Additional opinions on this would be super helpful for us. >> >> Thanks in advance, >> Ines >> >> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier <inesm.pot...@gmail.com> >> wrote: >> >>> Hi Cassandra community, >>> >>> We have recently encountered a recurring old IP reappearance issue while >>> testing decommissions on some of our Kubernetes Cassandra staging clusters. >>> We have not yet found other references to this issue online. We could >>> really use some additional inputs/opinions, both on the problem itself and >>> the fix we are currently considering. >>> >>> *Issue Description* >>> >>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We >>> have noticed that this behavior, associated with a decommission operation, >>> can get the cluster into an erroneous state. >>> >>> Consider the following situation: a Cassandra node node1 , with hostId1, >>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP). >>> After a couple gossip iterations, all other nodes’ nodetool status output >>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP >>> entry. >>> >>> Shortly after the bounce, node1 gets decommissioned. Our cluster does >>> not have a lot of data, and the decommission operation completes pretty >>> quickly. Logs on other nodes start showing acknowledgment that node1 has >>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s >>> pod is deleted. >>> >>> After a minute delay, the cluster enters the erroneous state. An old_IP >>> DN entry reappears in nodetool status, owning 20.5% of the token ring. >>> No node owns this IP anymore and according to logs, old_IP is still >>> associated with hostId1. >>> >>> *Issue Root Cause* >>> >>> By digging through Cassandra logs, and re-testing this scenario over and >>> over again, we have reached the following conclusion: >>> >>> - Other nodes will continue exchanging gossip about old_IP , even >>> after it becomes a fatClient. >>> - The fatClient timeout and subsequent quarantine does not stop >>> old_IP from reappearing in a node’s Gossip state, once its >>> quarantine is over. We believe that this is due to a misalignment on all >>> nodes’ old_IP expiration time. >>> - Once new_IP has left the cluster, and old_IP next gossip state >>> message is received by a node, StorageService will no longer face >>> collisions (or will, but with an even older IP) for hostId1 and its >>> corresponding tokens. As a result, old_IP will regain ownership of >>> 20.5% of the token ring. >>> >>> >>> *Proposed fix* >>> >>> Following the above investigation, we were thinking about implementing >>> the following fix: >>> >>> When a node receives a gossip status change with STATE_LEFT for a >>> leaving endpoint new_IP, before evicting new_IP from the token ring, >>> purge from Gossip (ie evictFromMembership) all endpoints that meet the >>> following criteria: >>> >>> - endpointStateMap contains this endpoint >>> - The endpoint is not currently a token owner ( >>> !tokenMetadata.isMember(endpoint)) >>> - The endpoint’s hostId matches the hostId of new_IP >>> - The endpoint is older than leaving_IP ( >>> Gossiper.instance.compareEndpointStartup) >>> - The endpoint’s token range (from endpointStateMap) intersects with >>> new_IP’s >>> >>> This modification’s intention is to force nodes to realign on old_IP >>> expiration, >>> and expunge it from Gossip so it does not reappear after new_IP leaves >>> the ring. >>> >>> >>> Additional opinions/ideas regarding the fix’s viability and the issue >>> itself would be really helpful. >>> Thanks in advance, >>> Ines >>> >>