ANSHUL SAINI created CASSANDRA-18771:
----------------------------------------

             Summary: Cassandra 4.0.5 nodes fails to start when replacing dead 
node
                 Key: CASSANDRA-18771
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18771
             Project: Cassandra
          Issue Type: Bug
          Components: Cluster/Gossip
            Reporter: ANSHUL SAINI


Trying to replace a down node the new nodes fail to start, using property 
{_}*replace_address*{_}.

Below message appears continuously in system logs.
{noformat}
WARN  [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 - 
/xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] 
dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before 
reaching the network

INFO  [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 - 
/xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed to 
connect

io.netty.channel.ConnectTimeoutException: connection timed out: 
/xxx.xxx.xxx.xxx:7000

    at 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)

    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)

    at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)

    at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)

    at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)

    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)

    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)

    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

    at java.lang.Thread.run(Thread.java:748)

{noformat}
xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node

NO other ERROR/WARNING appears in logs. The node goes into UJ state, but never 
joins the ring.

While this doesn't happen always, but we are seeing this increased behaviour 
since upgrading from 3.11.9 to 4.0.5.

Configuration are all fine as to mitigate this we terminate the node and spawn 
a new one with same configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to