RE: Cassandra is not showing a node up hours after restart

2019-12-06 Thread Paul Mena
As we are still without a functional Cassandra cluster in our development 
environment, I thought I’d try restarting the same node (one of 4 in the 
cluster) with the following command:

ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary && 
sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo service 
cassandra restart && until echo "SELECT * FROM system.peers LIMIT 1;" | cqlsh 
$ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep 10; done && echo 
"Node $ip is now UP"

The above command returned “Node is now UP” after about 40 seconds, confirmed 
on “node001” via “nodetool status”:

user@node001=> nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  OwnsHost ID 
  Rack
UN  192.168.187.121  539.43 GB  256 ?   
c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
UN  192.168.187.122  633.92 GB  256 ?   
bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
UN  192.168.187.123  576.31 GB  256 ?   
273df9f3-e496-4c65-a1f2-325ed288a992  rack1
UN  192.168.187.124  628.5 GB   256 ?   
b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1

As was the case before, running “nodetool status” on any of the other nodes 
shows that “node001” is still down:

user@node002=> nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  OwnsHost ID 
  Rack
DN  192.168.187.121  538.94 GB  256 ?   
c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
UN  192.168.187.122  634.04 GB  256 ?   
bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
UN  192.168.187.123  576.42 GB  256 ?   
273df9f3-e496-4c65-a1f2-325ed288a992  rack1
UN  192.168.187.124  628.56 GB  256 ?   
b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1

Is it inadvisable to continue with the rolling restart?

Paul Mena
Senior Application Administrator
WHOI - Information Services
508-289-3539

From: Shalom Sagges 
Sent: Tuesday, November 26, 2019 12:59 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra is not showing a node up hours after restart

Hi Paul,

From the gossipinfo output, it looks like the node's IP address and rpc_address 
are different.
/192.168.187.121 vs RPC_ADDRESS:192.168.185.121
You can also see that there's a schema disagreement between nodes, e.g. 
schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 it 
is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801.
You can run nodetool describecluster to see it as well.
So I suggest to change the rpc_address to the ip_address of the node or set it 
to 0.0.0.0 and it should resolve the issue.

Hope this helps!


On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen 
mailto:inquial...@gmail.com>> wrote:
Hello ,

Check and compare everything parameters

1. Java version should ideally match across all nodes in the cluster
2. Check if port 7000 is open between the nodes. Use telnet or nc commands
3. You must see some clues in system logs, why the gossip is failing.

Do confirm on the above things.

Thanks


On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, 
mailto:pm...@whoi.edu>> wrote:
NTP was restarted on the Cassandra nodes, but unfortunately I’m still getting 
the same result: the restarted node does not appear to be rejoining the cluster.

Here’s another data point: “nodetool gossipinfo”, when run from the restarted 
node (“node001”) shows a status of “normal”:

user@node001=> nodetool -u gossipinfo
/192.168.187.121
  generation:1574364410
  heartbeat:209150
  NET_VERSION:8
  RACK:rack1
  STATUS:NORMAL,-104847506331695918
  RELEASE_VERSION:2.1.9
  SEVERITY:0.0
  LOAD:5.78684155614E11
  HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
  SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
  DC:datacenter1
  RPC_ADDRESS:192.168.185.121

When run from one of the other nodes, however, node001’s status is shown as 
“shutdown”:

user@node002=> nodetool gossipinfo
/192.168.187.121
  generation:1491825076
  heartbeat:2147483647
  STATUS:shutdown,true
  RACK:rack1
  NET_VERSION:8
  LOAD:5.78679987693E11
  RELEASE_VERSION:2.1.9
  DC:datacenter1
  SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
  HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
  RPC_ADDRESS:192.168.185.121
  SEVERITY:0.0


Paul Mena
Senior Application Administrator
WHOI - Information Services
508-289-3539

From: Paul Mena
Sent: Monday, November 25, 2019 9:29 AM
To: user@cassandra.apache.org
Subject: RE: Cassandra is not showing a node up hours after restart

I’ve just discovered that NTP is not running on any of these Cassandra nodes, 
and that the timestamps are all over the map. Could this be causing my issue?

user@remote=> ansible pre-prod-cassandra -a date
node001.intra.myorg.org | CHANGED | rc=0 >>
Mon Nov 25 13:58:17 

Re: AWS ephemeral instances + backup

2019-12-06 Thread Reid Pinchback
Correction:  “most of your database will be in chunk cache, or buffer cache 
anyways.

From: Reid Pinchback 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, December 6, 2019 at 10:16 AM
To: "user@cassandra.apache.org" 
Subject: Re: AWS ephemeral instances + backup

Message from External Sender
If you’re only going to have a small storage footprint per node like 100gb, 
another option comes to mind. Use an instance type with large ram.  Use an EBS 
storage volume on an EBS-optimized instance type, and take EBS snapshots. Most 
of your database will be in chunk cache anyways, so you only need to make sure 
that the dirty background writer is keeping up.  I’d take a look at iowait 
during a snapshot and see if the results are acceptable for a running node.  
Even if it is marginal, if you’re only snapshotting one node at a time, then 
speculative retry would just skip over the temporary slowpoke.

From: Carl Mueller 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, December 5, 2019 at 3:21 PM
To: "user@cassandra.apache.org" 
Subject: AWS ephemeral instances + backup

Message from External Sender
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the 
sstables and commitlog files to the cheapest EBS volume type (those have bad 
IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables and 
commitlog state from the EBS to the ephemeral.

As can be seen: 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS 
volumes presumably) that would incur about a ten minute delay for node 
replacement for a 1TB node, but I imagine this would only be used on higher 
IOPS r/w nodes with smaller densities, so 100GB would be about a minute of 
delay only, already within the timeframes of an AWS node replacement/instance 
restart.




Re: AWS ephemeral instances + backup

2019-12-06 Thread Reid Pinchback
If you’re only going to have a small storage footprint per node like 100gb, 
another option comes to mind. Use an instance type with large ram.  Use an EBS 
storage volume on an EBS-optimized instance type, and take EBS snapshots. Most 
of your database will be in chunk cache anyways, so you only need to make sure 
that the dirty background writer is keeping up.  I’d take a look at iowait 
during a snapshot and see if the results are acceptable for a running node.  
Even if it is marginal, if you’re only snapshotting one node at a time, then 
speculative retry would just skip over the temporary slowpoke.

From: Carl Mueller 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, December 5, 2019 at 3:21 PM
To: "user@cassandra.apache.org" 
Subject: AWS ephemeral instances + backup

Message from External Sender
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the 
sstables and commitlog files to the cheapest EBS volume type (those have bad 
IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables and 
commitlog state from the EBS to the ephemeral.

As can be seen: 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS 
volumes presumably) that would incur about a ten minute delay for node 
replacement for a 1TB node, but I imagine this would only be used on higher 
IOPS r/w nodes with smaller densities, so 100GB would be about a minute of 
delay only, already within the timeframes of an AWS node replacement/instance 
restart.