unsubscribe
*Tom Nora * *Startup Growth & Funding* The Book -- <https://itunes.apple.com/us/author/tom-nora/id1208687100?mt=11>HACKING THE CORE <https://itunes.apple.com/us/author/tom-nora/id1208687100?mt=11> *linkedin <http://www.linkedin.com/in/tomnora/en> | twitter <https://twitter.com/tomnora> | angellist <https://angel.co/tomnora> * On Thu, Mar 9, 2023 at 10:57 AM Inès Potier <inesm.pot...@gmail.com> wrote: > Hi Cassandra community, > > Reaching out again in case anyone has recently faced the below issue. > Additional opinions on this would be super helpful for us. > > Thanks in advance, > Ines > > On Thu, Feb 23, 2023 at 3:40 PM Inès Potier <inesm.pot...@gmail.com> > wrote: > >> Hi Cassandra community, >> >> We have recently encountered a recurring old IP reappearance issue while >> testing decommissions on some of our Kubernetes Cassandra staging clusters. >> We have not yet found other references to this issue online. We could >> really use some additional inputs/opinions, both on the problem itself and >> the fix we are currently considering. >> >> *Issue Description* >> >> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have >> noticed that this behavior, associated with a decommission operation, can >> get the cluster into an erroneous state. >> >> Consider the following situation: a Cassandra node node1 , with hostId1, >> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP). >> After a couple gossip iterations, all other nodes’ nodetool status output >> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP >> entry. >> >> Shortly after the bounce, node1 gets decommissioned. Our cluster does >> not have a lot of data, and the decommission operation completes pretty >> quickly. Logs on other nodes start showing acknowledgment that node1 has >> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod >> is deleted. >> >> After a minute delay, the cluster enters the erroneous state. An old_IP DN >> entry reappears in nodetool status, owning 20.5% of the token ring. No node >> owns this IP anymore and according to logs, old_IP is still associated >> with hostId1. >> >> *Issue Root Cause* >> >> By digging through Cassandra logs, and re-testing this scenario over and >> over again, we have reached the following conclusion: >> >> - Other nodes will continue exchanging gossip about old_IP , even >> after it becomes a fatClient. >> - The fatClient timeout and subsequent quarantine does not stop old_IP >> from reappearing in a node’s Gossip state, once its quarantine is >> over. We believe that this is due to a misalignment on all nodes’ >> old_IP expiration time. >> - Once new_IP has left the cluster, and old_IP next gossip state >> message is received by a node, StorageService will no longer face >> collisions (or will, but with an even older IP) for hostId1 and its >> corresponding tokens. As a result, old_IP will regain ownership of >> 20.5% of the token ring. >> >> >> *Proposed fix* >> >> Following the above investigation, we were thinking about implementing >> the following fix: >> >> When a node receives a gossip status change with STATE_LEFT for a >> leaving endpoint new_IP, before evicting new_IP from the token ring, >> purge from Gossip (ie evictFromMembership) all endpoints that meet the >> following criteria: >> >> - endpointStateMap contains this endpoint >> - The endpoint is not currently a token owner ( >> !tokenMetadata.isMember(endpoint)) >> - The endpoint’s hostId matches the hostId of new_IP >> - The endpoint is older than leaving_IP ( >> Gossiper.instance.compareEndpointStartup) >> - The endpoint’s token range (from endpointStateMap) intersects with >> new_IP’s >> >> This modification’s intention is to force nodes to realign on old_IP >> expiration, >> and expunge it from Gossip so it does not reappear after new_IP leaves >> the ring. >> >> >> Additional opinions/ideas regarding the fix’s viability and the issue >> itself would be really helpful. >> Thanks in advance, >> Ines >> >