Sharing how we resolved this known issue as documented in the Jira previously.
The automation we put in place when an EC2 node is replaced to configure C* and get it started had a race condition. C* attempted to start before one of its dependencies was available. This causes the very first start of the C* process to fail. The node now was in HIBERNATE state and all attempts to restart C* afterwards returned error 'unable to connect any seed nodes'. To work around it, we used systemd timer to delay C* startup by a few minutes. The node bootstrapped successfully and we saw 'bootstrap completed!' message. It only takes C* to fail to bootstrap once after the EC2 is online. Since we were testing this in a lab environment, we were able to get the node of this situation by disabling auto_bootstrap then running nodetool rebuild. The alternative choice, was to removednode and add it back back to the cluster. Hope this helps. On Tue, Nov 11, 2025 at 10:57 AM dbms-tech <[email protected]> wrote: > Thanks. > The Jira describes a workaround by copying system.peers table. Not an > appetizing option. I reviewed the changes.txt for 5.0.6, but it does not > address the issue. What is the C* dev team recommending to get out of this > situation? Avoid 5.0.5 for now? Go back to 5.0.4? > > https://github.com/apache/cassandra/blob/cassandra-5.0.6/CHANGES.txt > > On Tue, Nov 11, 2025 at 10:37 AM manish khandelwal < > [email protected]> wrote: > >> Hi >> >> Generally a decommissioned node goes in hibernate state and if not wrong >> it is there for 72 hours. Not seen same for dead node replacement though >> not tested 5.0.5 as well. >> >> Regards >> Manish >> >> On Tue, 11 Nov 2025 at 3:13 AM, dbms-tech <[email protected]> wrote: >> >>> We are testing 5.0.5 as a viable upgrade target in our lab environment. >>> We noticed that when we replace a dead node, bootstrapping fails with the >>> error below. >>> >>> Some facts ... >>> 1- We didn't see this failure with 5.0.4. Started with 5.0.5. >>> 2- Ruled out networking issues. Seeds IP's are reachable by the node. >>> >>> This issue looks very similar to CASSANDRA-19580. Other nodes will not >>> send SYN messages to a node in hibernate state. >>> >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17930471 >>> >>> >>> The Jira above describes some workarounds, but they appear unreasonable >>> like copying system.peers table. >>> >>> 1- Is there a behavioral change in 5.0.5? >>> 2- Is there an operational workaround? >>> >>> Thanks >>> >>> ############################ >>> Bootstrap failure messages >>> ############################ >>> INFO [MemtableFlushWriter:3] 2025-11-10 20:01:06,450 >>> LogTransaction.java:249 - Unfinished transaction log, deleting >>> /data/cassandra/system/peers-37f71aca7dc2383ba70672528af04d4f/bti-da_txn_flush_fe335670-be6f-11f0-87fb-35b5e3b8e93a.log >>> Exception (java.lang.IllegalStateException) encountered during startup: >>> Unable to contact any seeds: [/10.xxx:7000, /10.xxxx:7000, /10.xxx:7000] >>> java.lang.IllegalStateException: Unable to contact any seeds: >>> [/10.xxx:7000, /10.xxx:7000, /10.xxx:7000] >>> at >>> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:2196) >>> at >>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1291) >>> at >>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1251) >>> at >>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:1032) >>> at >>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:947) >>> --------------------------------------------------------------------- >>> >>> ############################ >>> The boostrapping node is in hibernate state >>> ############################ >>> [ec2-user@ip-10-xxxx conf]$ nodetool gossipinfo >>> /10.xxx >>> generation:1762808084 >>> heartbeat:16 >>> STATUS:3:hibernate,true >>> SCHEMA:15:59adb24e-f3cd-3e02-97f0-5b395827453f >>> DC:11:us-east-1 >>> RACK:13:us-east-1a >>> RELEASE_VERSION:8:5.0.5 >>> RPC_ADDRESS:7:10.xxxx >>> NET_VERSION:4:13 >>> HOST_ID:5:28fcc0eb-f7aa-4626-a46b-5551438b6b4e >>> NATIVE_ADDRESS_AND_PORT:6:10.xxxxx:9042 >>> STATUS_WITH_PORT:2:hibernate,true >>> SSTABLE_VERSIONS:9:bti-da >>> TOKENS:1:<hidden> >>> --------------------------------------------------------------------- >>> >>> ---------------------------------------- >>> Thank you >>> >>> >>> > > -- > > ---------------------------------------- > Thank you > > > -- ---------------------------------------- Thank you
