The dead node being replaced went back to DN state indicating the new
replacement node failed to join the cluster, usually because the
streaming was interrupted (e.g. by network issues, or long STW GC
pauses). I would start looking for red flags in the logs, including
Cassandra's logs, GC logs, dmesg, systemd journal, etc., on the new
node, and other nodes in the cluster too. Also, I would try `nodetool
bootstrap resume` on the replacement node.
On 12/05/2025 09:53, Courtney wrote:
Hello everyone,
I have a cluster with 2 datacenters. I am using
GossipingPropertyFileSnitch as my endpoint snitch. Cassandra version
4.1.8. One datacenter is fully Ubuntu 24.04 and OpenJDK 11 and another
is Ubuntu 20.04 on OpenJDK 8. A seed node died in my second DC running
Ubuntu 20.04 hosts. I ordered a new dedicated server. I updated my
seeds to forget the dead seed node. I did the steps to replace a dead
node
JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS
-Dcassandra.replace_address_first_boot=<dead_node_ip>"
Configs between the old/new node are identical minus IP addresses and
that line above in the env file to replace the dead node. I started
the node and it started replacing the old node and was in the `UJ`
state. Not long into the process, the new node stops processing data
and the cluster forgets the new node and remembers the old one in its
`DN` state (which is turned off, no power). There are no errors in the
logs. I've tried different times hoping to solve the issue. I upped my
ROOT logging level to DEBUG, I also set
"org.apache.cassandra.gms.Gossiper TRACE". No errors.
With TRACE set for the Gossiper, I notice gossiping stops and data
stopping streaming about the same time. I cannot run any nodetool
commands on the new node. The process doesn't die, it leaves open
connections to nodes that are streaming data, but I don't see any data
streaming.
I've thought through a lot. Space isn't an issue, ulimits are set high
in /etc/security/limits.conf. Checking /proc/<pid>/limits shows the
values are high. I've replaced nodes before like this without issue,
but this one is causing me grief. Is there anything more I can do?
Courtney