Hello,

We are using C* version 2.1.6 and lately we are seeing an issue where,
nodetool removenode causing the schema to go out of sync and causing client
to fail for 2-3 minutes.

C* cluster is in 8 Datacenters with RF=3 and has 50 nodes.
We have 130 Keyspaces and 500 CF in the cluster.

Here are the sequence of actions that were performed.

1. One node failed abruptly in the cluster due to hardware issue
2. Remove the node from the cluster using nodetool removenode from a live
node.
3. Immediately I see all the nodes schema go out of sync and on the logs of
all the C* nodes, I see they mark few other (random) nodes as down. and
eventually recover after 2 minutes

Logs in the nodes:

INFO  [GossipTasks:1] 2017-06-27 20:34:39,707 Gossiper.java:1008 -
InetAddress /10.10.10.20 is now DOWN
INFO  [GossipTasks:1] 2017-06-27 20:34:39,714 Gossiper.java:1008 -
InetAddress /10.10.11.14 is now DOWN

Any one have an idea why, removenode causing the cluster to go out of sync?

Reply via email to