RE: Adding nodes

Marc Hoppins Mon, 11 Jul 2022 03:10:19 -0700

Service still running. No errors showing.

The latest info is in debug.log

DEBUG [Streaming-EventLoop-4-3] 2022-07-11 12:00:38,902 
NettyStreamingMessageSender.java:258 - [Stream 
#befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Sending keep-alive
DEBUG [Stream-Deserializer-/10.1.146.174:7000-053f2911] 2022-07-11 12:00:39,790 
StreamingInboundHandler.java:179 - [Stream 
#befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Received keep-alive
DEBUG [ScheduledTasks:1] 2022-07-11 12:00:44,688 StorageService.java:2398 - 
Ignoring application state LOAD from /x.x.x.64:7000 because it is not a member 
in token metadata
DEBUG [ScheduledTasks:1] 2022-07-11 12:01:44,689 StorageService.java:2398 - 
Ignoring application state LOAD from /x.x.x.64:7000 because it is not a member 
in token metadata
DEBUG [ScheduledTasks:1] 2022-07-11 12:02:44,690 StorageService.java:2398 - 
Ignoring application state LOAD from /x.x.x.64:7000 because it is not a member 
in token metadata

And

gc.log.1.current

2022-07-11T12:08:40.562+0200: 11122.837: [GC (Allocation Failure) 
2022-07-11T12:08:40.562+0200: 11122.838: [ParNew
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:      57264 bytes,      57264 total
: 655440K->74K(737280K), 0.0289143 secs] 2575800K->1920436K(8128512K), 
0.0291355 secs] [Times: user=0.23 sys=0.00, real=0.03 secs]
Heap after GC invocations=6532 (full 1):
par new generation   total 737280K, used 74K [0x00000005cae00000, 
0x00000005fce00000, 0x00000005fce00000)
  eden space 655360K,   0% used [0x00000005cae00000, 0x00000005cae00000, 
0x00000005f2e00000)
  from space 81920K,   0% used [0x00000005f2e00000, 0x00000005f2e12848, 
0x00000005f7e00000)
  to   space 81920K,   0% used [0x00000005f7e00000, 0x00000005f7e00000, 
0x00000005fce00000)
concurrent mark-sweep generation total 7391232K, used 1920362K 
[0x00000005fce00000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace       used 53255K, capacity 56387K, committed 56416K, reserved 
1097728K
  class space    used 6926K, capacity 7550K, committed 7576K, reserved 1048576K
}
2022-07-11T12:08:40.591+0200: 11122.867: Total time for which application 
threads were stopped: 0.0309913 seconds, Stopping threads took: 0.0012599 
seconds
{Heap before GC invocations=6532 (full 1):
par new generation   total 737280K, used 655434K [0x00000005cae00000, 
0x00000005fce00000, 0x00000005fce00000)
  eden space 655360K, 100% used [0x00000005cae00000, 0x00000005f2e00000, 
0x00000005f2e00000)
  from space 81920K,   0% used [0x00000005f2e00000, 0x00000005f2e12848, 
0x00000005f7e00000)
  to   space 81920K,   0% used [0x00000005f7e00000, 0x00000005f7e00000, 
0x00000005fce00000)
concurrent mark-sweep generation total 7391232K, used 1920362K 
[0x00000005fce00000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace       used 53255K, capacity 56387K, committed 56416K, reserved 
1097728K
  class space    used 6926K, capacity 7550K, committed 7576K, reserved 1048576K
2022-07-11T12:08:42.163+0200: 11124.438: [GC (Allocation Failure) 
2022-07-11T12:08:42.163+0200: 11124.438: [ParNew
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:      54984 bytes,      54984 total
: 655434K->80K(737280K), 0.0291754 secs] 2575796K->1920445K(8128512K), 
0.0293884 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
From: Bowen Song via user <[email protected]>
Sent: Monday, July 11, 2022 11:56 AM
To: [email protected]
Subject: Re: Adding nodes

EXTERNAL

Checking on multiple nodes won't help if the joining node suffers from any of 
the issues I described, as it will likely be flipping up and down frequently, 
and the existing nodes in the cluster may never reach an agreement before the 
joining node stays up (or stays down) for a while. However, it will be a very 
strange thing if this is a persistent behaviour. If the 'nodetool status' 
output on each node remained unchanged for hours and the outputs aren't the 
same between nodes, it could be an indicator of something else that had gone 
wrong.

Does the strange behaviour goes away after the joining node completes the 
streaming and fully joins the cluster?
On 11/07/2022 10:46, Marc Hoppins wrote:
I am beginning to wonder…

If you recall, I stated that I had checked status on a bunch of other nodes 
from both datacentres and the joining node shows up. No errors are occurring 
anywhere; data is streaming; node is joining…but, as I also stated, on the 
initial node which I only used to run the nodetool status, the new node is no 
longer showing up.  Thus the new node has not disappeared from the cluster, 
only from nodetool status on that particular node – which is already in the 
cluster, has been so for several weeks, and is also functioning without error.

From: Bowen Song via user 
<[email protected]><mailto:[email protected]>
Sent: Monday, July 11, 2022 11:40 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Adding nodes

EXTERNAL

A node in joining state can disappearing from the cluster from other nodes' 
perspective if the joining node stops sending/receiving gossip messages to 
other nodes. This can happen when the joining node is severely overloaded, has 
bad network connectivity or stuck in long STW GC pauses. Regardless of the 
reason behind it, the state shown on the joining node will remain as joining 
unless the steaming process has failed.

The node state is propagated between nodes via gossip, and there may be a delay 
before all existing nodes agree on the fact that the joining node is no longer 
in the cluster. Within that delay, different nodes in the cluster may show 
different results in 'nodetool status'.

You should check the logs on the existing nodes and the joining node to find 
out why is it happening, and make appropriate changes if needed.
On 11/07/2022 09:23, Marc Hoppins wrote:
Further oddities…

I was sitting here watching our new new node being added (nodetool status being 
run from one of the seed nodes) and all was going well.  Then I noticed that 
our new new node was no longer visible.  I checked the service on the new new 
node and it was still running. So I checked status from this node and it shows 
in the status report (still UJ and streaming data), but takes a little longer 
to get the results than it did when it was visible from the seed.

I checked status from a few different nodes in both datacentres (including 
other seeds) and the new new node shows up but from the original seed node, it 
does not appear in the nodetool status. Can anyone shed any light on this 
phenomena?

From: Marc Hoppins <[email protected]><mailto:[email protected]>
Sent: Monday, July 11, 2022 10:02 AM
To: [email protected]<mailto:[email protected]>
Cc: Bowen Song <[email protected]><mailto:[email protected]>
Subject: RE: Adding nodes

Well then…

I left this on Friday (still running) and came back to it today (Monday) to 
find the service stopped.  So, I blitzed this node from the ring and began anew 
with a different new node.

I rather suspect the problem was with trying to use Ansible to add these 
initially - despite the fact that I had a serial limit of 1 and a pause of 90s 
for starting the service on each new node (based on the time taken when setting 
up this Cassandra cluster).

So…moving forward…

It is recommended to only add one new node at a time from what I read.  This 
leads me to:

Although I see the new node LOAD is progressing far faster than the previous 
failure, it is still going to take several hours to move from UJ to UN, which 
means I’ll be at this all week for the 12 new nodes. If our LOAD per node is 
around 400-600GB, is there any practical method to speed up adding multiple new 
nodes which is unlikely to cause problems?  After all, in the modern world of 
big (how big is big?) data, 600G per node is far less than the real BIG 
big-data.

Marc

From: Jeff Jirsa <[email protected]<mailto:[email protected]>>
Sent: Friday, July 8, 2022 5:46 PM
To: cassandra <[email protected]<mailto:[email protected]>>
Cc: Bowen Song <[email protected]<mailto:[email protected]>>
Subject: Re: Adding nodes

EXTERNAL
Having a node UJ but not sending/receiving other streams is an invalid state 
(unless 4.0 moved the streaming data out of netstats? I'm not 100% sure, but 
I'm 99% sure it should be there).

It likely stopped the bootstrap process long ago with an error (which you may 
not have seen), and is running without being in the ring, but also not trying 
to join the ring.

145GB vs 1.1T could be bits vs bytes (that's a factor of 8), or it could be 
that you streamed data and compacted it away. Hard to say, but less important - 
the fact that it's UJ but not streaming means there's a different problem.

If it's me, I do this (not guaranteed to work, your mileage may vary, etc):
1) Look for errors in the logs of ALL hosts. In the joining host, look for an 
exception that stops bootstrap. In the others, look for messages about errors 
streaming, and/or exceptions around file access. In all of those hosts, check 
to see if any of them think they're streaming ( nodetool netstats again)
2) Stop the joining host. It's almost certainly not going to finish now. Remove 
data directories, commitlog directory, saved caches, hints. Wait 2 minutes. 
Make sure every other host in the cluster sees it disappear from the ring. 
Then, start it fresh and let it bootstrap again. (you could alternatively try 
the resumable bootstrap option, but I never use it).

On Fri, Jul 8, 2022 at 2:56 AM Marc Hoppins 
<[email protected]<mailto:[email protected]>> wrote:
Ifconfig shows RX of 1.1T. This doesn't seem to fit with the LOAD of 145GiB 
(nodetool status), unless I am reading that wrong...and the fact that this node 
still has a status of UJ.

Netstats on this node shows (other than :
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed   Dropped
Large messages                  n/a         0              0         0
Small messages                  n/a        53      569755545  15740262
Gossip messages                 n/a         0         288878         2
None of this addresses the issue of not being able to add more nodes.

-----Original Message-----
From: Bowen Song via user 
<[email protected]<mailto:[email protected]>>
Sent: Friday, July 8, 2022 11:47 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Adding nodes

EXTERNAL

I would assume that's 85 GB (i.e. gigabytes) then. Which is approximately 79 
GiB (i.e. gibibytes). This still sounds awfully slow - less than 1MB/s over a 
full day (24 hours).

You said CPU and network aren't the bottleneck. Have you checked the disk IO? 
Also, be mindful with CPU usage. It can still be a bottleneck if one thread 
uses 100% of a CPU core while all other cores are idle.

On 08/07/2022 07:09, Marc Hoppins wrote:
> Thank you for pointing that out.
>
> 85 gigabytes/gibibytes/GIGABYTES/GIBIBYTES/whatever name you care to
> give it
>
> CPU and bandwidth are not the problem.
>
> Version 4.0.3 but, as I stated, all nodes use the same version so the version 
> is not important either.
>
> Existing nodes have 350-400+(choose whatever you want to call a
> gigabyte)
>
> The problem appears to be that adding new nodes is a serial process, which is 
> fine when there is no data and each node is added within 2minutes.  It is 
> hardly practical in production.
>
> -----Original Message-----
> From: Bowen Song via user 
> <[email protected]<mailto:[email protected]>>
> Sent: Thursday, July 7, 2022 8:43 PM
> To: [email protected]<mailto:[email protected]>
> Subject: Re: Adding nodes
>
> EXTERNAL
>
>
> 86Gb (that's gigabits, which is 10.75GB, gigabytes) took an entire day seems 
> obviously too long. I would check the network bandwidth, disk IO and CPU 
> usage and find out what is the bottleneck.
>
> On 07/07/2022 15:48, Marc Hoppins wrote:
>> Hi all,
>>
>> Cluster of 2 DC and 24 nodes
>>
>> DC1 (RF3) = 12 nodes, 16 tokens each
>> DC2 (RF3) = 12 nodes, 16 tokens each
>>
>> Adding 12 more nodes to DC1: I installed Cassandra (version is the same 
>> across all nodes) but, after the first node added, I couldn't seem to add 
>> any further nodes.
>>
>> I check nodetool status and the newly added node is UJ. It remains this way 
>> all day and only 86Gb of data is added to the node over the entire day 
>> (probably not yet complete).  This seems a little slow and, more than a 
>> little inconvenient to only be able to add one node at a time - or at least 
>> one node every 2 minutes.  When the cluster was created, I timed each node 
>> from service start to status UJ (having a UUID) and it was around 120 
>> seconds.  Of course there was no data.
>>
>> Is it possible I have some setting not correctly tuned?
>>
>> Thanks
>>
>> Marc

RE: Adding nodes

Reply via email to