Re: Cassandra is not showing a node up hours after restart

Shalom Sagges Mon, 25 Nov 2019 22:15:22 -0800

Sorry, disregard the schema ID. It's too early in the morning here ;)

On Tue, Nov 26, 2019 at 7:58 AM Shalom Sagges <[email protected]>
wrote:


> Hi Paul,
>
> From the gossipinfo output, it looks like the node's IP address and
> rpc_address are different.
> /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121
> You can also see that there's a schema disagreement between nodes, e.g.
> schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002
> it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801.
> You can run nodetool describecluster to see it as well.
> So I suggest to change the rpc_address to the ip_address of the node or
> set it to 0.0.0.0 and it should resolve the issue.
>
> Hope this helps!
>
>
> On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen <[email protected]>
> wrote:
>
>> Hello ,
>>
>> Check and compare everything parameters
>>
>> 1. Java version should ideally match across all nodes in the cluster
>> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands
>> 3. You must see some clues in system logs, why the gossip is failing.
>>
>> Do confirm on the above things.
>>
>> Thanks
>>
>>
>> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <[email protected]> wrote:
>>
>>> NTP was restarted on the Cassandra nodes, but unfortunately I’m still
>>> getting the same result: the restarted node does not appear to be rejoining
>>> the cluster.
>>>
>>>
>>>
>>> Here’s another data point: “nodetool gossipinfo”, when run from the
>>> restarted node (“node001”) shows a status of “normal”:
>>>
>>>
>>>
>>> user@node001=> nodetool -u gossipinfo
>>>
>>> /192.168.187.121
>>>
>>>   generation:1574364410
>>>
>>>   heartbeat:209150
>>>
>>>   NET_VERSION:8
>>>
>>>   RACK:rack1
>>>
>>>   STATUS:NORMAL,-104847506331695918
>>>
>>>   RELEASE_VERSION:2.1.9
>>>
>>>   SEVERITY:0.0
>>>
>>>   LOAD:5.78684155614E11
>>>
>>>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>>>
>>>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>>>
>>>   DC:datacenter1
>>>
>>>   RPC_ADDRESS:192.168.185.121
>>>
>>>
>>>
>>> When run from one of the other nodes, however, node001’s status is shown
>>> as “shutdown”:
>>>
>>>
>>>
>>> user@node002=> nodetool gossipinfo
>>>
>>> /192.168.187.121
>>>
>>>   generation:1491825076
>>>
>>>   heartbeat:2147483647
>>>
>>>   STATUS:shutdown,true
>>>
>>>   RACK:rack1
>>>
>>>   NET_VERSION:8
>>>
>>>   LOAD:5.78679987693E11
>>>
>>>   RELEASE_VERSION:2.1.9
>>>
>>>   DC:datacenter1
>>>
>>>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>>>
>>>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>>>
>>>   RPC_ADDRESS:192.168.185.121
>>>
>>>   SEVERITY:0.0
>>>
>>>
>>>
>>>
>>>
>>> *Paul Mena*
>>>
>>> Senior Application Administrator
>>>
>>> WHOI - Information Services
>>>
>>> 508-289-3539
>>>
>>>
>>>
>>> *From:* Paul Mena
>>> *Sent:* Monday, November 25, 2019 9:29 AM
>>> *To:* [email protected]
>>> *Subject:* RE: Cassandra is not showing a node up hours after restart
>>>
>>>
>>>
>>> I’ve just discovered that NTP is not running on any of these Cassandra
>>> nodes, and that the timestamps are all over the map. Could this be causing
>>> my issue?
>>>
>>>
>>>
>>> user@remote=> ansible pre-prod-cassandra -a date
>>>
>>> node001.intra.myorg.org | CHANGED | rc=0 >>
>>>
>>> Mon Nov 25 13:58:17 UTC 2019
>>>
>>>
>>>
>>> node004.intra.myorg.org | CHANGED | rc=0 >>
>>>
>>> Mon Nov 25 14:07:20 UTC 2019
>>>
>>>
>>>
>>> node003.intra.myorg.org | CHANGED | rc=0 >>
>>>
>>> Mon Nov 25 13:57:06 UTC 2019
>>>
>>>
>>>
>>> node001.intra.myorg.org | CHANGED | rc=0 >>
>>>
>>> Mon Nov 25 14:07:22 UTC 2019
>>>
>>>
>>>
>>> *Paul Mena*
>>>
>>> Senior Application Administrator
>>>
>>> WHOI - Information Services
>>>
>>> 508-289-3539
>>>
>>>
>>>
>>> *From:* Inquistive allen <[email protected]>
>>> *Sent:* Monday, November 25, 2019 2:46 AM
>>> *To:* [email protected]
>>> *Subject:* Re: Cassandra is not showing a node up hours after restart
>>>
>>>
>>>
>>> Hello team,
>>>
>>>
>>>
>>> Just to add on to the discussion, one may run,
>>>
>>> Nodetool disablebinary followed by a nodetool disablethrift followed by
>>> nodetool drain.
>>>
>>> Nodetool drain also does the work of nodetool flush+ declaring in the
>>> cluster that I'm down and not accepting traffic.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <[email protected]>
>>> wrote:
>>>
>>> Before Cassandra shutdown, nodetool drain should be executed first. As
>>> soon as you do nodetool drain, others node will see this node down and no
>>> new traffic will come to this node.
>>>
>>> I generally gives 10 seconds gap between nodetool drain and Cassandra
>>> stop.
>>>
>>>
>>>
>>> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <[email protected]> wrote:
>>>
>>> Thank you for the replies. I had made no changes to the config before
>>> the rolling restart.
>>>
>>>
>>>
>>> I can try another restart but was wondering if I should do it
>>> differently. I had simply done "service cassandra stop" followed by
>>> "service cassandra start".  Since then I've seen some suggestions to
>>> proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain".
>>> Are these commands advisable? Are any other commands recommended either
>>> before the shutdown or after the startup?
>>>
>>>
>>>
>>> Thanks again!
>>>
>>>
>>>
>>> Paul
>>> ------------------------------
>>>
>>> *From:* Naman Gupta <[email protected]>
>>> *Sent:* Sunday, November 24, 2019 11:18:14 AM
>>> *To:* [email protected]
>>> *Subject:* Re: Cassandra is not showing a node up hours after restart
>>>
>>>
>>>
>>> Did you change the name of datacenter or any other config changes before
>>> the rolling restart?
>>>
>>>
>>>
>>> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <[email protected]> wrote:
>>>
>>> I am in the process of doing a rolling restart on a 4-node cluster
>>> running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via
>>> "service cassandra stop/start", and noted nothing unusual in either
>>> system.log or cassandra.log. Doing a "nodetool status" from node 1 shows
>>> all four nodes up:
>>>
>>>
>>>
>>> user@node001=> nodetool status
>>>
>>> Datacenter: datacenter1
>>>
>>> =======================
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  Address          Load       Tokens  Owns    Host ID                     
>>>           Rack
>>>
>>> UN  192.168.187.121  538.95 GB  256     ?       
>>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>>>
>>> UN  192.168.187.122  630.72 GB  256     ?       
>>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>>>
>>> UN  192.168.187.123  572.73 GB  256     ?       
>>> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>>>
>>> UN  192.168.187.124  625.05 GB  256     ?       
>>> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>>>
>>> But doing the same command from any other of the 3 nodes shows node 1
>>> still down.
>>>
>>>
>>>
>>> user@node002=> nodetool status
>>>
>>> Datacenter: datacenter1
>>>
>>> =======================
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  Address          Load       Tokens  Owns    Host ID                     
>>>           Rack
>>>
>>> DN  192.168.187.121  538.94 GB  256     ?       
>>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>>>
>>> UN  192.168.187.122  630.72 GB  256     ?       
>>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>>>
>>> UN  192.168.187.123  572.73 GB  256     ?       
>>> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>>>
>>> UN  192.168.187.124  625.04 GB  256     ?       
>>> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>>>
>>> Is there something I can do to remedy this current situation - so that I
>>> can continue with the rolling restart?
>>>
>>>
>>>
>>>

Re: Cassandra is not showing a node up hours after restart

Reply via email to