The thing to look for in GC logs would be signs that you’re bouncing against 
your memory limits and spending a lot of time in full GC collections.

I’m not sure at what phase it kicks in but definitely there is the potential 
for memory issues when you have large column families (large in the number of 
columns I mean), and you’re mentioning that the situation gets worse in 
proportion to the number of tables brought GC to mind.  Not sure about 
proportion of nodes, I think there are thread counts that increase with the 
number of nodes, and increased threads also can add to GC load, particularly in 
G1GC.

I’m speculating a bit on possible causes, but basically the idea was to look 
for GC load during those 3 minutes, because if you see it then you’re not 
hunting for a timeout tuning or anything like that, you’re hunting for a 
resource allocation tuning.

From: Jai Bheemsen Rao Dhanwada <jaibheem...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, June 1, 2020 at 7:15 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Cassandra Bootstrap Sequence

Message from External Sender
Is there anything specific to for in GC logs?
b/w this delay happens always whenever I bootstrap the node or restart a C* 
process.

I don't believe it's a GC issue and correction from initial question, it's not 
just bootstrap, but every restart of C* process is causing this.

On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback 
<rpinchb...@tripadvisor.com<mailto:rpinchb...@tripadvisor.com>> wrote:
That gap seems a long time.  Have you checked GC logs around the timeframe?

From: Jai Bheemsen Rao Dhanwada 
<jaibheem...@gmail.com<mailto:jaibheem...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, June 1, 2020 at 3:52 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Cassandra Bootstrap Sequence

Message from External Sender
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
gossip settle and port opening. Can someone please explain me where this delay 
is configured and can this be changed? I don't see any information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I 
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to 
settle...
INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop
INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
[netty-buffer=netty-buffer-4.0.44.Final.452812a, 
netty-codec=netty-codec-4.0.44.Final.452812a, 
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
netty-common=netty-common-4.0.44.Final.452812a, 
netty-handler=netty-handler-4.0.44.Final.452812a, 
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
netty-transport=netty-transport-4.0.44.Final.452812a, 
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, 
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
CQL clients on /x.x.x.x:9042 (encrypted)...

Also during this 3 minutes delay, I am losing all my metrics from the C* 
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.

Reply via email to