Re: electricity outage problem

Adil Fri, 15 Jan 2016 09:07:35 -0800

our case is not about accepting connection, some nodes receives gossip
generation number greater the local one, a looked at the tables peers and
local and can't found where local one is stored.


2016-01-15 17:54 GMT+01:00 daemeon reiydelle <daeme...@gmail.com>:

> Nodes need about 60-90 second delay before it can start accepting
> connections as a seed node. Also a seed node needs time to accept a node
> starting up, and syncing to other nodes (on 10gbit the max new nodes is
> only 1 or 2, on 1gigabit it can handle at least 3-4 new nodes connecting).
> In a large cluster (500 nodes) I see this wierd condition where nodetool
> status shows overlapping subsets of nodes, and the problem does not go away
> after even an hour on a 10 gigabit network).
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Fri, Jan 15, 2016 at 9:17 AM, Adil <adil.cha...@gmail.com> wrote:
>
>> Hi,
>> we did full restart of the cluster but nodetool status still giving
>> incoerent info from different nodes, some nodes appers UP from a node but
>> appers DOWN from another, and in the log as is said still having the
>> message "received an invalid gossip generation for peer /x.x.x.x"
>> cassandra version is 2.1.2, we want to execute the purge operation as
>> explained here
>> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html
>> but we don't found the peers folder, should we do it via cql deleting the
>> peers content? should we do it for all nodes?
>>
>> thanks
>>
>>
>> 2016-01-12 17:42 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>:
>>
>>> Sometimes you may have to clear out the saved Gossip state:
>>>
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html
>>>
>>> Note the instruction about bringing up the seed nodes first. Normally
>>> seed nodes are only relevant when initially joining a node to a cluster
>>> (and then the Gossip state will be persisted locally), but if you clear te
>>> persisted Gossip state the seed nodes will again be needed to find the rest
>>> of the cluster.
>>>
>>> I'm not sure whether a power outage is the same as stopping and
>>> restarting an instance (AWS) in terms of whether the restarted instance
>>> retains its current public IP address.
>>>
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Jan 12, 2016 at 10:02 AM, daemeon reiydelle <daeme...@gmail.com>
>>> wrote:
>>>
>>>> This happens when there is insufficient time for nodes coming up to
>>>> join a network. It takes a few seconds for a node to come up, e.g. your
>>>> seed node. If you tell a node to join a cluster you can get this scenario
>>>> because of high network utilization as well. I wait 90 seconds after the
>>>> first (i.e. my first seed) node comes up to start the next one. Any nodes
>>>> that are seeds need some 60 seconds, so the additional 30 seconds is a
>>>> buffer. Additional nodes each wait 60 seconds before joining (although this
>>>> is a parallel tree for large clusters).
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *.......*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *“Life should not be a journey to the grave with the intention of
>>>> arriving safely in apretty and well preserved body, but rather to skid in
>>>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>>>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>>>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>>>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>> On Tue, Jan 12, 2016 at 6:56 AM, Adil <adil.cha...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> we have two DC with 5 nodes in each cluster, yesterday there was an
>>>>> electricity outage causing all nodes down, we restart the clusters but 
>>>>> when
>>>>> we run nodetool status on DC1 it results that some nodes are DN, the
>>>>> strange thing is that running the command from diffrent node in DC1 
>>>>> doesn't
>>>>> give the same node in DC as own, we have noticed this message in the log
>>>>> "received an invalid gossip generation for peer", does anyone know how to
>>>>> resolve this problem? should we purge the gossip?
>>>>>
>>>>> thanks
>>>>>
>>>>> Adil
>>>>>
>>>>
>>>>
>>>
>>
>

Re: electricity outage problem

Reply via email to