On 2017-06-28 18:51 (-0700), Jai Bheemsen Rao Dhanwada <jaibheem...@gmail.com> 
wrote: 
> Hello,
> 
> We are using C* version 2.1.6 and lately we are seeing an issue where,
> nodetool removenode causing the schema to go out of sync and causing client
> to fail for 2-3 minutes.
> 
> C* cluster is in 8 Datacenters with RF=3 and has 50 nodes.
> We have 130 Keyspaces and 500 CF in the cluster.
> 
> Here are the sequence of actions that were performed.
> 
> 1. One node failed abruptly in the cluster due to hardware issue
> 2. Remove the node from the cluster using nodetool removenode from a live
> node.
> 3. Immediately I see all the nodes schema go out of sync and on the logs of
> all the C* nodes, I see they mark few other (random) nodes as down. and
> eventually recover after 2 minutes
> 
> Logs in the nodes:
> 
> INFO  [GossipTasks:1] 2017-06-27 20:34:39,707 Gossiper.java:1008 -
> InetAddress /10.10.10.20 is now DOWN
> INFO  [GossipTasks:1] 2017-06-27 20:34:39,714 Gossiper.java:1008 -
> InetAddress /10.10.11.14 is now DOWN
> 
> Any one have an idea why, removenode causing the cluster to go out of sync?
> 

That's not really expected - I've never seen behavior like that. However, 2.1.6 
is pretty old (just about 2 years, give or take), there have been hundreds or 
(more likely) thousands of fixes since then.  

Is the gossiper line the only thing logged? Anything about invalid generations?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to