postmortem on 2.2.13 scale out difficulties

Carl Mueller Tue, 11 Jun 2019 13:11:44 -0700

We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS, IPV6


Needed to scale out the asia datacenter, which was 5 nodes, europe and us
were 25 nodes

We were running into bootstrapping issues where the new node failed to
bootstrap/stream, it failed with

"java.lang.RuntimeException: A node required to move the data consistently
is down"

...even though they were all up based on nodetool status prior to adding
the node.

First we increased the phi_convict_threshold to 12, and that did not help.

CASSANDRA-12281 appeared similar to what we had problems with, but I don't
think we hit that. Somewhere in there someone wrote

"For us, the workaround is either deleting the data (then bootstrap again),
or increasing the ring_delay_ms. And the larger the cluster is, the longer
ring_delay_ms is needed. Based on our tests, for a 40 nodes cluster, it
requires ring_delay_ms to be >50seconds. For a 70 nodes cluster,
>100seconds. Default is 30seconds."

Given the WAN nature or our DCs, we used ring_delay_ms to 100 seconds and
it finally worked.

side note:

During the rolling restarts for setting phi_convict_threshold we observed
quite a lot of status map variance between nodes (we have a program to poll
all of a datacenter or cluster's view of the gossipinfo and statuses. AWS
appears to have variance in networking based on the phi_convict_threshold
advice, I'm not sure if our difficulties were typical in that regard and/or
if our IPV6 and/or globally distributed datacenters were exacerbating
factors.

We could not reproduce this in loadtest, although loadtest is only eu and
us (but is IPV6)

postmortem on 2.2.13 scale out difficulties

Reply via email to