our case is not about accepting connection, some nodes receives gossip generation number greater the local one, a looked at the tables peers and local and can't found where local one is stored.
2016-01-15 17:54 GMT+01:00 daemeon reiydelle <daeme...@gmail.com>: > Nodes need about 60-90 second delay before it can start accepting > connections as a seed node. Also a seed node needs time to accept a node > starting up, and syncing to other nodes (on 10gbit the max new nodes is > only 1 or 2, on 1gigabit it can handle at least 3-4 new nodes connecting). > In a large cluster (500 nodes) I see this wierd condition where nodetool > status shows overlapping subsets of nodes, and the problem does not go away > after even an hour on a 10 gigabit network). > > > > *.......* > > > > > > > *“Life should not be a journey to the grave with the intention of arriving > safely in apretty and well preserved body, but rather to skid in broadside > in a cloud of smoke,thoroughly used up, totally worn out, and loudly > proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA > (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 > <%28%2B44%29%20%280%29%2020%208144%209872>* > > On Fri, Jan 15, 2016 at 9:17 AM, Adil <adil.cha...@gmail.com> wrote: > >> Hi, >> we did full restart of the cluster but nodetool status still giving >> incoerent info from different nodes, some nodes appers UP from a node but >> appers DOWN from another, and in the log as is said still having the >> message "received an invalid gossip generation for peer /x.x.x.x" >> cassandra version is 2.1.2, we want to execute the purge operation as >> explained here >> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html >> but we don't found the peers folder, should we do it via cql deleting the >> peers content? should we do it for all nodes? >> >> thanks >> >> >> 2016-01-12 17:42 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: >> >>> Sometimes you may have to clear out the saved Gossip state: >>> >>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html >>> >>> Note the instruction about bringing up the seed nodes first. Normally >>> seed nodes are only relevant when initially joining a node to a cluster >>> (and then the Gossip state will be persisted locally), but if you clear te >>> persisted Gossip state the seed nodes will again be needed to find the rest >>> of the cluster. >>> >>> I'm not sure whether a power outage is the same as stopping and >>> restarting an instance (AWS) in terms of whether the restarted instance >>> retains its current public IP address. >>> >>> >>> >>> -- Jack Krupansky >>> >>> On Tue, Jan 12, 2016 at 10:02 AM, daemeon reiydelle <daeme...@gmail.com> >>> wrote: >>> >>>> This happens when there is insufficient time for nodes coming up to >>>> join a network. It takes a few seconds for a node to come up, e.g. your >>>> seed node. If you tell a node to join a cluster you can get this scenario >>>> because of high network utilization as well. I wait 90 seconds after the >>>> first (i.e. my first seed) node comes up to start the next one. Any nodes >>>> that are seeds need some 60 seconds, so the additional 30 seconds is a >>>> buffer. Additional nodes each wait 60 seconds before joining (although this >>>> is a parallel tree for large clusters). >>>> >>>> >>>> >>>> >>>> >>>> *.......* >>>> >>>> >>>> >>>> >>>> >>>> >>>> *“Life should not be a journey to the grave with the intention of >>>> arriving safely in apretty and well preserved body, but rather to skid in >>>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and >>>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. >>>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) >>>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>* >>>> >>>> On Tue, Jan 12, 2016 at 6:56 AM, Adil <adil.cha...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> we have two DC with 5 nodes in each cluster, yesterday there was an >>>>> electricity outage causing all nodes down, we restart the clusters but >>>>> when >>>>> we run nodetool status on DC1 it results that some nodes are DN, the >>>>> strange thing is that running the command from diffrent node in DC1 >>>>> doesn't >>>>> give the same node in DC as own, we have noticed this message in the log >>>>> "received an invalid gossip generation for peer", does anyone know how to >>>>> resolve this problem? should we purge the gossip? >>>>> >>>>> thanks >>>>> >>>>> Adil >>>>> >>>> >>>> >>> >> >