Nodes need about 60-90 second delay before it can start accepting connections as a seed node. Also a seed node needs time to accept a node starting up, and syncing to other nodes (on 10gbit the max new nodes is only 1 or 2, on 1gigabit it can handle at least 3-4 new nodes connecting). In a large cluster (500 nodes) I see this wierd condition where nodetool status shows overlapping subsets of nodes, and the problem does not go away after even an hour on a 10 gigabit network).
*.......* *“Life should not be a journey to the grave with the intention of arriving safely in apretty and well preserved body, but rather to skid in broadside in a cloud of smoke,thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Fri, Jan 15, 2016 at 9:17 AM, Adil <adil.cha...@gmail.com> wrote: > Hi, > we did full restart of the cluster but nodetool status still giving > incoerent info from different nodes, some nodes appers UP from a node but > appers DOWN from another, and in the log as is said still having the > message "received an invalid gossip generation for peer /x.x.x.x" > cassandra version is 2.1.2, we want to execute the purge operation as > explained here > https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html > but we don't found the peers folder, should we do it via cql deleting the > peers content? should we do it for all nodes? > > thanks > > > 2016-01-12 17:42 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: > >> Sometimes you may have to clear out the saved Gossip state: >> >> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html >> >> Note the instruction about bringing up the seed nodes first. Normally >> seed nodes are only relevant when initially joining a node to a cluster >> (and then the Gossip state will be persisted locally), but if you clear te >> persisted Gossip state the seed nodes will again be needed to find the rest >> of the cluster. >> >> I'm not sure whether a power outage is the same as stopping and >> restarting an instance (AWS) in terms of whether the restarted instance >> retains its current public IP address. >> >> >> >> -- Jack Krupansky >> >> On Tue, Jan 12, 2016 at 10:02 AM, daemeon reiydelle <daeme...@gmail.com> >> wrote: >> >>> This happens when there is insufficient time for nodes coming up to join >>> a network. It takes a few seconds for a node to come up, e.g. your seed >>> node. If you tell a node to join a cluster you can get this scenario >>> because of high network utilization as well. I wait 90 seconds after the >>> first (i.e. my first seed) node comes up to start the next one. Any nodes >>> that are seeds need some 60 seconds, so the additional 30 seconds is a >>> buffer. Additional nodes each wait 60 seconds before joining (although this >>> is a parallel tree for large clusters). >>> >>> >>> >>> >>> >>> *.......* >>> >>> >>> >>> >>> >>> >>> *“Life should not be a journey to the grave with the intention of >>> arriving safely in apretty and well preserved body, but rather to skid in >>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and >>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. >>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) >>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>* >>> >>> On Tue, Jan 12, 2016 at 6:56 AM, Adil <adil.cha...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> we have two DC with 5 nodes in each cluster, yesterday there was an >>>> electricity outage causing all nodes down, we restart the clusters but when >>>> we run nodetool status on DC1 it results that some nodes are DN, the >>>> strange thing is that running the command from diffrent node in DC1 doesn't >>>> give the same node in DC as own, we have noticed this message in the log >>>> "received an invalid gossip generation for peer", does anyone know how to >>>> resolve this problem? should we purge the gossip? >>>> >>>> thanks >>>> >>>> Adil >>>> >>> >>> >> >