Sharing how we resolved this known issue as documented in the Jira
previously.

The automation we put in place when an EC2 node is replaced to configure C*
and get it started had a race condition. C* attempted to start before one
of its dependencies was available. This causes the very first start of the
C* process to fail. The node now was in HIBERNATE state and all attempts to
restart C* afterwards returned error 'unable to connect any seed nodes'.

To work around it, we used systemd timer to delay C* startup by a few
minutes. The node bootstrapped successfully and we saw 'bootstrap
completed!' message.

It only takes C* to fail to bootstrap once after the EC2 is online. Since
we were testing this in a lab environment, we were able to get the node of
this situation by disabling auto_bootstrap then running nodetool rebuild.
The alternative choice, was to removednode and add it back back to the
cluster.

Hope this helps.


On Tue, Nov 11, 2025 at 10:57 AM dbms-tech <[email protected]> wrote:

> Thanks.
> The Jira describes a workaround by copying system.peers table. Not an
> appetizing option. I reviewed the changes.txt for 5.0.6, but it does not
> address the issue. What is the C* dev team recommending to get out of this
> situation? Avoid 5.0.5 for now? Go back to 5.0.4?
>
> https://github.com/apache/cassandra/blob/cassandra-5.0.6/CHANGES.txt
>
> On Tue, Nov 11, 2025 at 10:37 AM manish khandelwal <
> [email protected]> wrote:
>
>> Hi
>>
>> Generally a decommissioned node goes in hibernate state and if not wrong
>> it is there for 72 hours. Not seen same for dead node replacement though
>> not tested 5.0.5 as well.
>>
>> Regards
>> Manish
>>
>> On Tue, 11 Nov 2025 at 3:13 AM, dbms-tech <[email protected]> wrote:
>>
>>> We are testing 5.0.5 as a viable upgrade target in our lab environment.
>>> We noticed that when we replace a dead node, bootstrapping fails with the
>>> error below.
>>>
>>> Some facts ...
>>> 1- We didn't see this failure with 5.0.4. Started with 5.0.5.
>>> 2- Ruled out networking issues. Seeds IP's are reachable by the node.
>>>
>>> This issue looks very similar to CASSANDRA-19580. Other nodes will not
>>> send SYN messages to a node in hibernate state.
>>>
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17930471
>>>
>>>
>>> The Jira above describes some workarounds, but they appear unreasonable
>>> like copying system.peers table.
>>>
>>> 1- Is there a behavioral change in 5.0.5?
>>> 2- Is there an operational workaround?
>>>
>>> Thanks
>>>
>>> ############################
>>> Bootstrap failure messages
>>> ############################
>>> INFO  [MemtableFlushWriter:3] 2025-11-10 20:01:06,450
>>> LogTransaction.java:249 - Unfinished transaction log, deleting
>>> /data/cassandra/system/peers-37f71aca7dc2383ba70672528af04d4f/bti-da_txn_flush_fe335670-be6f-11f0-87fb-35b5e3b8e93a.log
>>> Exception (java.lang.IllegalStateException) encountered during startup:
>>> Unable to contact any seeds: [/10.xxx:7000, /10.xxxx:7000, /10.xxx:7000]
>>> java.lang.IllegalStateException: Unable to contact any seeds:
>>> [/10.xxx:7000, /10.xxx:7000, /10.xxx:7000]
>>>         at
>>> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:2196)
>>>         at
>>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1291)
>>>         at
>>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1251)
>>>         at
>>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:1032)
>>>         at
>>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:947)
>>> ---------------------------------------------------------------------
>>>
>>> ############################
>>> The boostrapping node is in hibernate state
>>> ############################
>>> [ec2-user@ip-10-xxxx conf]$ nodetool gossipinfo
>>> /10.xxx
>>>   generation:1762808084
>>>   heartbeat:16
>>>   STATUS:3:hibernate,true
>>>   SCHEMA:15:59adb24e-f3cd-3e02-97f0-5b395827453f
>>>   DC:11:us-east-1
>>>   RACK:13:us-east-1a
>>>   RELEASE_VERSION:8:5.0.5
>>>   RPC_ADDRESS:7:10.xxxx
>>>   NET_VERSION:4:13
>>>   HOST_ID:5:28fcc0eb-f7aa-4626-a46b-5551438b6b4e
>>>   NATIVE_ADDRESS_AND_PORT:6:10.xxxxx:9042
>>>   STATUS_WITH_PORT:2:hibernate,true
>>>   SSTABLE_VERSIONS:9:bti-da
>>>   TOKENS:1:<hidden>
>>> ---------------------------------------------------------------------
>>>
>>> ----------------------------------------
>>> Thank you
>>>
>>>
>>>
>
> --
>
> ----------------------------------------
> Thank you
>
>
>

-- 

----------------------------------------
Thank you

Reply via email to