Re: Adding nodes

2022-07-11 Thread Bowen Song via user
I've noticed the joining node has a different rack than the rest of the nodes, is this intended? Will you add all new nodes to this rack and have RF=2 in that DC? In principal, you should have equal number of servers (vnodes) in each rack, and have the rack number = RF or 1. On 11/07/2022

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Sorry if this appears spammy but, being new to this, there are always questions. This is on the joining node: (prod) marc@ba-freddy14:/var/log/cassandra $ /opt/cassandra/bin/nodetool netstats -H|grep -i receiving

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
All clocks are fine. Why would time synch would affect whether or not a node appears in the nodetool status when running the command on a different node? Either the node is up and visible or not. From 24 other nodes (including ba-freddy14 itself), it shows in the status. For those other 23

Re: Adding nodes

2022-07-11 Thread Joe Obernberger
I too came from HBase and discovered adding several nodes at a time doesn't work.  Are you absolutely sure that the clocks are in sync across the nodes?  This has bitten me several times. -Joe On 7/11/2022 6:23 AM, Bowen Song via user wrote: You should look for warning and error level logs

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
You should look for warning and error level logs in the system.log, not the debug.log or gc.log, and certainly not only the latest lines. BTW, you may want to spend some time investigating potential GC issues based on the GC logs you provided. I can see 1 full GC in the 3 hours since the node

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Maybe I am not being clear enough. The 90/120 seconds was for NEW NODES TO A NEW CLUSTER WITH NO DATA. Being that this tool/suite/application is new to both the database folk and us support folk and, given that we are currently using HBASE and thus can add several nodes at a time to a new

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
How long doe it take to add a new node? I'm 100% sure neither 90s nor 120s is the answer. The answer is it varies. If you want to wait for finishing adding a new node, be explicit about it, wait for the node fully joins the cluster. Don't put a fixed number of seconds in there. You can

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Service still running. No errors showing. The latest info is in debug.log DEBUG [Streaming-EventLoop-4-3] 2022-07-11 12:00:38,902 NettyStreamingMessageSender.java:258 - [Stream #befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Sending keep-alive DEBUG

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
Checking on multiple nodes won't help if the joining node suffers from any of the issues I described, as it will likely be flipping up and down frequently, and the existing nodes in the cluster may never reach an agreement before the joining node stays up (or stays down) for a while. However,

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
I am beginning to wonder… If you recall, I stated that I had checked status on a bunch of other nodes from both datacentres and the joining node shows up. No errors are occurring anywhere; data is streaming; node is joining…but, as I also stated, on the initial node which I only used to run

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
“Where did you come up with the 90 seconds number?” The database folk came up with THAT number. For myself, I timed adding a new node at 120 seconds for the initial setup with no data in the cluster. “What exactly are you waiting for by doing that?” I wanted to see for myself how long it took

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
A node in joining state can disappearing from the cluster from other nodes' perspective if the joining node stops sending/receiving gossip messages to other nodes. This can happen when the joining node is severely overloaded, has bad network connectivity or stuck in long STW GC pauses.

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
Sleeping/pausing for a fixed amount of time between operations at best is a hack to workaround an unknown issue, but it's almost always better to be explicit about what you are waiting for. Where did you come up with the 90 seconds number? What exactly are you waiting for by doing that? If you

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Further oddities… I was sitting here watching our new new node being added (nodetool status being run from one of the seed nodes) and all was going well. Then I noticed that our new new node was no longer visible. I checked the service on the new new node and it was still running. So I

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Well then… I left this on Friday (still running) and came back to it today (Monday) to find the service stopped. So, I blitzed this node from the ring and began anew with a different new node. I rather suspect the problem was with trying to use Ansible to add these initially - despite the