Sorry, disregard the schema ID. It's too early in the morning here ;) On Tue, Nov 26, 2019 at 7:58 AM Shalom Sagges <[email protected]> wrote:
> Hi Paul, > > From the gossipinfo output, it looks like the node's IP address and > rpc_address are different. > /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121 > You can also see that there's a schema disagreement between nodes, e.g. > schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 > it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. > You can run nodetool describecluster to see it as well. > So I suggest to change the rpc_address to the ip_address of the node or > set it to 0.0.0.0 and it should resolve the issue. > > Hope this helps! > > > On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen <[email protected]> > wrote: > >> Hello , >> >> Check and compare everything parameters >> >> 1. Java version should ideally match across all nodes in the cluster >> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands >> 3. You must see some clues in system logs, why the gossip is failing. >> >> Do confirm on the above things. >> >> Thanks >> >> >> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <[email protected]> wrote: >> >>> NTP was restarted on the Cassandra nodes, but unfortunately I’m still >>> getting the same result: the restarted node does not appear to be rejoining >>> the cluster. >>> >>> >>> >>> Here’s another data point: “nodetool gossipinfo”, when run from the >>> restarted node (“node001”) shows a status of “normal”: >>> >>> >>> >>> user@node001=> nodetool -u gossipinfo >>> >>> /192.168.187.121 >>> >>> generation:1574364410 >>> >>> heartbeat:209150 >>> >>> NET_VERSION:8 >>> >>> RACK:rack1 >>> >>> STATUS:NORMAL,-104847506331695918 >>> >>> RELEASE_VERSION:2.1.9 >>> >>> SEVERITY:0.0 >>> >>> LOAD:5.78684155614E11 >>> >>> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >>> >>> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >>> >>> DC:datacenter1 >>> >>> RPC_ADDRESS:192.168.185.121 >>> >>> >>> >>> When run from one of the other nodes, however, node001’s status is shown >>> as “shutdown”: >>> >>> >>> >>> user@node002=> nodetool gossipinfo >>> >>> /192.168.187.121 >>> >>> generation:1491825076 >>> >>> heartbeat:2147483647 >>> >>> STATUS:shutdown,true >>> >>> RACK:rack1 >>> >>> NET_VERSION:8 >>> >>> LOAD:5.78679987693E11 >>> >>> RELEASE_VERSION:2.1.9 >>> >>> DC:datacenter1 >>> >>> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >>> >>> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >>> >>> RPC_ADDRESS:192.168.185.121 >>> >>> SEVERITY:0.0 >>> >>> >>> >>> >>> >>> *Paul Mena* >>> >>> Senior Application Administrator >>> >>> WHOI - Information Services >>> >>> 508-289-3539 >>> >>> >>> >>> *From:* Paul Mena >>> *Sent:* Monday, November 25, 2019 9:29 AM >>> *To:* [email protected] >>> *Subject:* RE: Cassandra is not showing a node up hours after restart >>> >>> >>> >>> I’ve just discovered that NTP is not running on any of these Cassandra >>> nodes, and that the timestamps are all over the map. Could this be causing >>> my issue? >>> >>> >>> >>> user@remote=> ansible pre-prod-cassandra -a date >>> >>> node001.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 13:58:17 UTC 2019 >>> >>> >>> >>> node004.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 14:07:20 UTC 2019 >>> >>> >>> >>> node003.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 13:57:06 UTC 2019 >>> >>> >>> >>> node001.intra.myorg.org | CHANGED | rc=0 >> >>> >>> Mon Nov 25 14:07:22 UTC 2019 >>> >>> >>> >>> *Paul Mena* >>> >>> Senior Application Administrator >>> >>> WHOI - Information Services >>> >>> 508-289-3539 >>> >>> >>> >>> *From:* Inquistive allen <[email protected]> >>> *Sent:* Monday, November 25, 2019 2:46 AM >>> *To:* [email protected] >>> *Subject:* Re: Cassandra is not showing a node up hours after restart >>> >>> >>> >>> Hello team, >>> >>> >>> >>> Just to add on to the discussion, one may run, >>> >>> Nodetool disablebinary followed by a nodetool disablethrift followed by >>> nodetool drain. >>> >>> Nodetool drain also does the work of nodetool flush+ declaring in the >>> cluster that I'm down and not accepting traffic. >>> >>> >>> >>> Thanks >>> >>> >>> >>> >>> >>> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <[email protected]> >>> wrote: >>> >>> Before Cassandra shutdown, nodetool drain should be executed first. As >>> soon as you do nodetool drain, others node will see this node down and no >>> new traffic will come to this node. >>> >>> I generally gives 10 seconds gap between nodetool drain and Cassandra >>> stop. >>> >>> >>> >>> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <[email protected]> wrote: >>> >>> Thank you for the replies. I had made no changes to the config before >>> the rolling restart. >>> >>> >>> >>> I can try another restart but was wondering if I should do it >>> differently. I had simply done "service cassandra stop" followed by >>> "service cassandra start". Since then I've seen some suggestions to >>> proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain". >>> Are these commands advisable? Are any other commands recommended either >>> before the shutdown or after the startup? >>> >>> >>> >>> Thanks again! >>> >>> >>> >>> Paul >>> ------------------------------ >>> >>> *From:* Naman Gupta <[email protected]> >>> *Sent:* Sunday, November 24, 2019 11:18:14 AM >>> *To:* [email protected] >>> *Subject:* Re: Cassandra is not showing a node up hours after restart >>> >>> >>> >>> Did you change the name of datacenter or any other config changes before >>> the rolling restart? >>> >>> >>> >>> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <[email protected]> wrote: >>> >>> I am in the process of doing a rolling restart on a 4-node cluster >>> running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via >>> "service cassandra stop/start", and noted nothing unusual in either >>> system.log or cassandra.log. Doing a "nodetool status" from node 1 shows >>> all four nodes up: >>> >>> >>> >>> user@node001=> nodetool status >>> >>> Datacenter: datacenter1 >>> >>> ======================= >>> >>> Status=Up/Down >>> >>> |/ State=Normal/Leaving/Joining/Moving >>> >>> -- Address Load Tokens Owns Host ID >>> Rack >>> >>> UN 192.168.187.121 538.95 GB 256 ? >>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >>> >>> UN 192.168.187.122 630.72 GB 256 ? >>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >>> >>> UN 192.168.187.123 572.73 GB 256 ? >>> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >>> >>> UN 192.168.187.124 625.05 GB 256 ? >>> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >>> >>> But doing the same command from any other of the 3 nodes shows node 1 >>> still down. >>> >>> >>> >>> user@node002=> nodetool status >>> >>> Datacenter: datacenter1 >>> >>> ======================= >>> >>> Status=Up/Down >>> >>> |/ State=Normal/Leaving/Joining/Moving >>> >>> -- Address Load Tokens Owns Host ID >>> Rack >>> >>> DN 192.168.187.121 538.94 GB 256 ? >>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >>> >>> UN 192.168.187.122 630.72 GB 256 ? >>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >>> >>> UN 192.168.187.123 572.73 GB 256 ? >>> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >>> >>> UN 192.168.187.124 625.04 GB 256 ? >>> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >>> >>> Is there something I can do to remedy this current situation - so that I >>> can continue with the rolling restart? >>> >>> >>> >>>
