On 2017-06-28 18:51 (-0700), Jai Bheemsen Rao Dhanwada <jaibheem...@gmail.com> wrote: > Hello, > > We are using C* version 2.1.6 and lately we are seeing an issue where, > nodetool removenode causing the schema to go out of sync and causing client > to fail for 2-3 minutes. > > C* cluster is in 8 Datacenters with RF=3 and has 50 nodes. > We have 130 Keyspaces and 500 CF in the cluster. > > Here are the sequence of actions that were performed. > > 1. One node failed abruptly in the cluster due to hardware issue > 2. Remove the node from the cluster using nodetool removenode from a live > node. > 3. Immediately I see all the nodes schema go out of sync and on the logs of > all the C* nodes, I see they mark few other (random) nodes as down. and > eventually recover after 2 minutes > > Logs in the nodes: > > INFO [GossipTasks:1] 2017-06-27 20:34:39,707 Gossiper.java:1008 - > InetAddress /10.10.10.20 is now DOWN > INFO [GossipTasks:1] 2017-06-27 20:34:39,714 Gossiper.java:1008 - > InetAddress /10.10.11.14 is now DOWN > > Any one have an idea why, removenode causing the cluster to go out of sync? >
That's not really expected - I've never seen behavior like that. However, 2.1.6 is pretty old (just about 2 years, give or take), there have been hundreds or (more likely) thousands of fixes since then. Is the gossiper line the only thing logged? Anything about invalid generations? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org