We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS, IPV6

Needed to scale out the asia datacenter, which was 5 nodes, europe and us
were 25 nodes

We were running into bootstrapping issues where the new node failed to
bootstrap/stream, it failed with

"java.lang.RuntimeException: A node required to move the data consistently
is down"

...even though they were all up based on nodetool status prior to adding
the node.

First we increased the phi_convict_threshold to 12, and that did not help.

CASSANDRA-12281 appeared similar to what we had problems with, but I don't
think we hit that. Somewhere in there someone wrote

"For us, the workaround is either deleting the data (then bootstrap again),
or increasing the ring_delay_ms. And the larger the cluster is, the longer
ring_delay_ms is needed. Based on our tests, for a 40 nodes cluster, it
requires ring_delay_ms to be >50seconds. For a 70 nodes cluster,
>100seconds. Default is 30seconds."

Given the WAN nature or our DCs, we used ring_delay_ms to 100 seconds and
it finally worked.

side note:

During the rolling restarts for setting phi_convict_threshold we observed
quite a lot of status map variance between nodes (we have a program to poll
all of a datacenter or cluster's view of the gossipinfo and statuses. AWS
appears to have variance in networking based on the phi_convict_threshold
advice, I'm not sure if our difficulties were typical in that regard and/or
if our IPV6 and/or globally distributed datacenters were exacerbating
factors.

We could not reproduce this in loadtest, although loadtest is only eu and
us (but is IPV6)

Reply via email to