Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-10 Thread Inès Potier
Thanks for your response!
Following your advice, I filed a Jira ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-18319

On Thu, Mar 9, 2023 at 11:16 AM Jeff Jirsa  wrote:

> I described something roughly similar to this a few years ago on the list.
> The specific chain you're describing isn't one I've thought about before,
> but if you open a JIRA for tracking and attribution, I'll ask some folks to
> take a peek at it.
>
>
>
> On Thu, Mar 9, 2023 at 10:57 AM Inès Potier 
> wrote:
>
>> Hi Cassandra community,
>>
>> Reaching out again in case anyone has recently faced the below issue.
>> Additional opinions on this would be super helpful for us.
>>
>> Thanks in advance,
>> Ines
>>
>> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier 
>> wrote:
>>
>>> Hi Cassandra community,
>>>
>>> We have recently encountered a recurring old IP reappearance issue while
>>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>>> We have not yet found other references to this issue online. We could
>>> really use some additional inputs/opinions, both on the problem itself and
>>> the fix we are currently considering.
>>>
>>> *Issue Description*
>>>
>>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We
>>> have noticed that this behavior, associated with a decommission operation,
>>> can get the cluster into an erroneous state.
>>>
>>> Consider the following situation: a Cassandra node node1 , with hostId1,
>>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>>> After a couple gossip iterations, all other nodes’ nodetool status output
>>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>>> entry.
>>>
>>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>>> not have a lot of data, and the decommission operation completes pretty
>>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s
>>> pod is deleted.
>>>
>>> After a minute delay, the cluster enters the erroneous state. An  old_IP
>>>  DN entry reappears in nodetool status, owning 20.5% of the token ring.
>>> No node owns this IP anymore and according to logs, old_IP is still
>>> associated with hostId1.
>>>
>>> *Issue Root Cause*
>>>
>>> By digging through Cassandra logs, and re-testing this scenario over and
>>> over again, we have reached the following conclusion:
>>>
>>>- Other nodes will continue exchanging gossip about old_IP , even
>>>after it becomes a fatClient.
>>>- The fatClient timeout and subsequent quarantine does not stop
>>>old_IP from reappearing in a node’s Gossip state, once its
>>>quarantine is over. We believe that this is due to a misalignment on all
>>>nodes’ old_IP expiration time.
>>>- Once new_IP has left the cluster, and old_IP next gossip state
>>>message is received by a node, StorageService will no longer face
>>>collisions (or will, but with an even older IP) for hostId1 and its
>>>corresponding tokens. As a result, old_IP will regain ownership of
>>>20.5% of the token ring.
>>>
>>>
>>> *Proposed fix*
>>>
>>> Following the above investigation, we were thinking about implementing
>>> the following fix:
>>>
>>> When a node receives a gossip status change with STATE_LEFT for a
>>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>>> following criteria:
>>>
>>>- endpointStateMap contains this endpoint
>>>- The endpoint is not currently a token owner (
>>>!tokenMetadata.isMember(endpoint))
>>>- The endpoint’s hostId matches the hostId of new_IP
>>>- The endpoint is older than leaving_IP (
>>>Gossiper.instance.compareEndpointStartup)
>>>- The endpoint’s token range (from endpointStateMap) intersects with
>>>new_IP’s
>>>
>>> This modification’s intention is to force nodes to realign on old_IP 
>>> expiration,
>>> and expunge it from Gossip so it does not reappear after new_IP leaves
>>> the ring.
>>>
>>>
>>> Additional opinions/ideas regarding the fix’s viability and the issue
>>> itself would be really helpful.
>>> Thanks in advance,
>>> Ines
>>>
>>


Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Tom Nora
unsubscribe


*Tom Nora  *

*Startup Growth & Funding*

The Book --
HACKING THE
CORE 

*linkedin   |  twitter
  |  angellist   *





On Thu, Mar 9, 2023 at 10:57 AM Inès Potier  wrote:

> Hi Cassandra community,
>
> Reaching out again in case anyone has recently faced the below issue.
> Additional opinions on this would be super helpful for us.
>
> Thanks in advance,
> Ines
>
> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier 
> wrote:
>
>> Hi Cassandra community,
>>
>> We have recently encountered a recurring old IP reappearance issue while
>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>> We have not yet found other references to this issue online. We could
>> really use some additional inputs/opinions, both on the problem itself and
>> the fix we are currently considering.
>>
>> *Issue Description*
>>
>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
>> noticed that this behavior, associated with a decommission operation, can
>> get the cluster into an erroneous state.
>>
>> Consider the following situation: a Cassandra node node1 , with hostId1,
>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>> After a couple gossip iterations, all other nodes’ nodetool status output
>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>> entry.
>>
>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>> not have a lot of data, and the decommission operation completes pretty
>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
>> is deleted.
>>
>> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
>> entry reappears in nodetool status, owning 20.5% of the token ring. No node
>> owns this IP anymore and according to logs, old_IP is still associated
>> with hostId1.
>>
>> *Issue Root Cause*
>>
>> By digging through Cassandra logs, and re-testing this scenario over and
>> over again, we have reached the following conclusion:
>>
>>- Other nodes will continue exchanging gossip about old_IP , even
>>after it becomes a fatClient.
>>- The fatClient timeout and subsequent quarantine does not stop old_IP
>> from reappearing in a node’s Gossip state, once its quarantine is
>>over. We believe that this is due to a misalignment on all nodes’
>>old_IP expiration time.
>>- Once new_IP has left the cluster, and old_IP next gossip state
>>message is received by a node, StorageService will no longer face
>>collisions (or will, but with an even older IP) for hostId1 and its
>>corresponding tokens. As a result, old_IP will regain ownership of
>>20.5% of the token ring.
>>
>>
>> *Proposed fix*
>>
>> Following the above investigation, we were thinking about implementing
>> the following fix:
>>
>> When a node receives a gossip status change with STATE_LEFT for a
>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>> following criteria:
>>
>>- endpointStateMap contains this endpoint
>>- The endpoint is not currently a token owner (
>>!tokenMetadata.isMember(endpoint))
>>- The endpoint’s hostId matches the hostId of new_IP
>>- The endpoint is older than leaving_IP (
>>Gossiper.instance.compareEndpointStartup)
>>- The endpoint’s token range (from endpointStateMap) intersects with
>>new_IP’s
>>
>> This modification’s intention is to force nodes to realign on old_IP 
>> expiration,
>> and expunge it from Gossip so it does not reappear after new_IP leaves
>> the ring.
>>
>>
>> Additional opinions/ideas regarding the fix’s viability and the issue
>> itself would be really helpful.
>> Thanks in advance,
>> Ines
>>
>


Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Jeff Jirsa
I described something roughly similar to this a few years ago on the list.
The specific chain you're describing isn't one I've thought about before,
but if you open a JIRA for tracking and attribution, I'll ask some folks to
take a peek at it.



On Thu, Mar 9, 2023 at 10:57 AM Inès Potier  wrote:

> Hi Cassandra community,
>
> Reaching out again in case anyone has recently faced the below issue.
> Additional opinions on this would be super helpful for us.
>
> Thanks in advance,
> Ines
>
> On Thu, Feb 23, 2023 at 3:40 PM Inès Potier 
> wrote:
>
>> Hi Cassandra community,
>>
>> We have recently encountered a recurring old IP reappearance issue while
>> testing decommissions on some of our Kubernetes Cassandra staging clusters.
>> We have not yet found other references to this issue online. We could
>> really use some additional inputs/opinions, both on the problem itself and
>> the fix we are currently considering.
>>
>> *Issue Description*
>>
>> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
>> noticed that this behavior, associated with a decommission operation, can
>> get the cluster into an erroneous state.
>>
>> Consider the following situation: a Cassandra node node1 , with hostId1,
>> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
>> After a couple gossip iterations, all other nodes’ nodetool status output
>> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
>> entry.
>>
>> Shortly after the bounce, node1 gets decommissioned. Our cluster does
>> not have a lot of data, and the decommission operation completes pretty
>> quickly. Logs on other nodes start showing acknowledgment that node1 has
>> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
>> is deleted.
>>
>> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
>> entry reappears in nodetool status, owning 20.5% of the token ring. No node
>> owns this IP anymore and according to logs, old_IP is still associated
>> with hostId1.
>>
>> *Issue Root Cause*
>>
>> By digging through Cassandra logs, and re-testing this scenario over and
>> over again, we have reached the following conclusion:
>>
>>- Other nodes will continue exchanging gossip about old_IP , even
>>after it becomes a fatClient.
>>- The fatClient timeout and subsequent quarantine does not stop old_IP
>> from reappearing in a node’s Gossip state, once its quarantine is
>>over. We believe that this is due to a misalignment on all nodes’
>>old_IP expiration time.
>>- Once new_IP has left the cluster, and old_IP next gossip state
>>message is received by a node, StorageService will no longer face
>>collisions (or will, but with an even older IP) for hostId1 and its
>>corresponding tokens. As a result, old_IP will regain ownership of
>>20.5% of the token ring.
>>
>>
>> *Proposed fix*
>>
>> Following the above investigation, we were thinking about implementing
>> the following fix:
>>
>> When a node receives a gossip status change with STATE_LEFT for a
>> leaving endpoint new_IP, before evicting new_IP from the token ring,
>> purge from Gossip (ie evictFromMembership) all endpoints that meet the
>> following criteria:
>>
>>- endpointStateMap contains this endpoint
>>- The endpoint is not currently a token owner (
>>!tokenMetadata.isMember(endpoint))
>>- The endpoint’s hostId matches the hostId of new_IP
>>- The endpoint is older than leaving_IP (
>>Gossiper.instance.compareEndpointStartup)
>>- The endpoint’s token range (from endpointStateMap) intersects with
>>new_IP’s
>>
>> This modification’s intention is to force nodes to realign on old_IP 
>> expiration,
>> and expunge it from Gossip so it does not reappear after new_IP leaves
>> the ring.
>>
>>
>> Additional opinions/ideas regarding the fix’s viability and the issue
>> itself would be really helpful.
>> Thanks in advance,
>> Ines
>>
>


Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Inès Potier
Hi Cassandra community,

Reaching out again in case anyone has recently faced the below issue.
Additional opinions on this would be super helpful for us.

Thanks in advance,
Ines

On Thu, Feb 23, 2023 at 3:40 PM Inès Potier  wrote:

> Hi Cassandra community,
>
> We have recently encountered a recurring old IP reappearance issue while
> testing decommissions on some of our Kubernetes Cassandra staging clusters.
> We have not yet found other references to this issue online. We could
> really use some additional inputs/opinions, both on the problem itself and
> the fix we are currently considering.
>
> *Issue Description*
>
> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
> noticed that this behavior, associated with a decommission operation, can
> get the cluster into an erroneous state.
>
> Consider the following situation: a Cassandra node node1 , with hostId1,
> owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP).
> After a couple gossip iterations, all other nodes’ nodetool status output
> includes a new_IP UN entry owning 20.5% of the token ring and no old_IP
> entry.
>
> Shortly after the bounce, node1 gets decommissioned. Our cluster does not
> have a lot of data, and the decommission operation completes pretty
> quickly. Logs on other nodes start showing acknowledgment that node1 has
> left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod
> is deleted.
>
> After a minute delay, the cluster enters the erroneous state. An  old_IP DN
> entry reappears in nodetool status, owning 20.5% of the token ring. No node
> owns this IP anymore and according to logs, old_IP is still associated
> with hostId1.
>
> *Issue Root Cause*
>
> By digging through Cassandra logs, and re-testing this scenario over and
> over again, we have reached the following conclusion:
>
>- Other nodes will continue exchanging gossip about old_IP , even
>after it becomes a fatClient.
>- The fatClient timeout and subsequent quarantine does not stop old_IP from
>reappearing in a node’s Gossip state, once its quarantine is over. We
>believe that this is due to a misalignment on all nodes’ old_IP expiration
>time.
>- Once new_IP has left the cluster, and old_IP next gossip state
>message is received by a node, StorageService will no longer face
>collisions (or will, but with an even older IP) for hostId1 and its
>corresponding tokens. As a result, old_IP will regain ownership of
>20.5% of the token ring.
>
>
> *Proposed fix*
>
> Following the above investigation, we were thinking about implementing the
> following fix:
>
> When a node receives a gossip status change with STATE_LEFT for a leaving
> endpoint new_IP, before evicting new_IP from the token ring, purge from
> Gossip (ie evictFromMembership) all endpoints that meet the following
> criteria:
>
>- endpointStateMap contains this endpoint
>- The endpoint is not currently a token owner (
>!tokenMetadata.isMember(endpoint))
>- The endpoint’s hostId matches the hostId of new_IP
>- The endpoint is older than leaving_IP (
>Gossiper.instance.compareEndpointStartup)
>- The endpoint’s token range (from endpointStateMap) intersects with
>new_IP’s
>
> This modification’s intention is to force nodes to realign on old_IP 
> expiration,
> and expunge it from Gossip so it does not reappear after new_IP leaves
> the ring.
>
>
> Additional opinions/ideas regarding the fix’s viability and the issue
> itself would be really helpful.
> Thanks in advance,
> Ines
>