Re: Cassandra in Kubernetes: IP switch decommission issue

Tom Nora Thu, 09 Mar 2023 15:00:21 -0800

unsubscribe


*Tom Nora  *

*Startup Growth & Funding*

The Book --
<https://itunes.apple.com/us/author/tom-nora/id1208687100?mt=11>HACKING THE
CORE <https://itunes.apple.com/us/author/tom-nora/id1208687100?mt=11>

*linkedin <http://www.linkedin.com/in/tomnora/en>  |  twitter
<https://twitter.com/tomnora>  |  angellist <https://angel.co/tomnora>  *





On Thu, Mar 9, 2023 at 10:57 AM Inès Potier <inesm.pot...@gmail.com> wrote:

> Hi Cassandra community,
>
> Reaching out again in case anyone has recently faced the below issue.
> Additional opinions on this would be super helpful for us.
>
> Thanks in advance,
> Ines
>
> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier <inesm.pot...@gmail.com>
> wrote:
>
>> Hi Cassandra community,
>>
>> We have recently encountered a recurring old IP reappearance issue while
>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>> We have not yet found other references to this issue online. We could
>> really use some additional inputs/opinions, both on the problem itself and
>> the fix we are currently considering.
>>
>> *Issue Description*
>>
>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
>> noticed that this behavior, associated with a decommission operation, can
>> get the cluster into an erroneous state.
>>
>> Consider the following situation: a Cassandra node node1 , with hostId1,
>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>> After a couple gossip iterations, all other nodes’ nodetool status output
>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>> entry.
>>
>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>> not have a lot of data, and the decommission operation completes pretty
>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
>> is deleted.
>>
>> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
>> entry reappears in nodetool status, owning 20.5% of the token ring. No node
>> owns this IP anymore and according to logs, old_IP is still associated
>> with hostId1.
>>
>> *Issue Root Cause*
>>
>> By digging through Cassandra logs, and re-testing this scenario over and
>> over again, we have reached the following conclusion:
>>
>>    - Other nodes will continue exchanging gossip about old_IP , even
>>    after it becomes a fatClient.
>>    - The fatClient timeout and subsequent quarantine does not stop old_IP
>>     from reappearing in a node’s Gossip state, once its quarantine is
>>    over. We believe that this is due to a misalignment on all nodes’
>>    old_IP expiration time.
>>    - Once new_IP has left the cluster, and old_IP next gossip state
>>    message is received by a node, StorageService will no longer face
>>    collisions (or will, but with an even older IP) for hostId1 and its
>>    corresponding tokens. As a result, old_IP will regain ownership of
>>    20.5% of the token ring.
>>
>>
>> *Proposed fix*
>>
>> Following the above investigation, we were thinking about implementing
>> the following fix:
>>
>> When a node receives a gossip status change with STATE_LEFT for a
>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>> following criteria:
>>
>>    - endpointStateMap contains this endpoint
>>    - The endpoint is not currently a token owner (
>>    !tokenMetadata.isMember(endpoint))
>>    - The endpoint’s hostId matches the hostId of new_IP
>>    - The endpoint is older than leaving_IP (
>>    Gossiper.instance.compareEndpointStartup)
>>    - The endpoint’s token range (from endpointStateMap) intersects with
>>    new_IP’s
>>
>> This modification’s intention is to force nodes to realign on old_IP 
>> expiration,
>> and expunge it from Gossip so it does not reappear after new_IP leaves
>> the ring.
>>
>>
>> Additional opinions/ideas regarding the fix’s viability and the issue
>> itself would be really helpful.
>> Thanks in advance,
>> Ines
>>
>

Re: Cassandra in Kubernetes: IP switch decommission issue

Reply via email to