One other thing that might be interesting to clarify is whether this occurs on a rolling restart as well as a complete bring-down/bring-up.
On Wed, Jan 21, 2026 at 3:54 PM Caleb Rackliffe <[email protected]> wrote: > Hi Ashaman, > > Would you be willing to create a Jira with a full description of the > problem and assign it to me? (Caleb Rackliffe/maedhroz) > > Thanks! > > On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]> > wrote: > >> You’ve written a *very* solid technical analysis already, and what >> you’re describing *is not expected behavior in Cassandra 5.0.x*. Based >> on the symptoms, stack trace, and the version‑gating logic you inspected, >> what you’re hitting is almost certainly a *real bug* in the 5.0.x SAI >> gossip serialization path — specifically in the *pre‑convergence version >> detection logic*. >> >> Let me walk you through what’s actually happening, why your cluster gets >> stuck, and what the Cassandra community would typically recommend. >> >> 🧠 *What’s Actually Going Wrong (Root Cause)* >> >> *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)* >> >> That fix added a *compressed SAI index‑status gossip format* to avoid >> blowing up the gossip payload size. >> >> *2. But the fix is gated on minVersion* >> >> The new format is only used when: >> >> - Gossip has converged >> - All nodes report version ≥ 5.0.3 >> >> *3. During startup, minVersion is null* >> >> This is the key problem. >> >> On a cold restart: >> >> - Nodes have *no* gossip state yet >> - minVersion is treated as *unknown* >> - Cassandra falls back to the *legacy (pre‑5.0.3) format* >> - That format serializes *all* SAI index metadata uncompressed >> - With many keyspaces/tables/indexes, the payload becomes huge >> - TypeSizes.sizeof() asserts because the payload exceeds the expected >> bounds >> >> *4. The node never joins gossip → gossip never converges → new format >> never activates* >> >> This creates a *deadlock*: >> >> Needs gossip convergence → to use compact format Needs compact format → >> to join gossip >> >> This is exactly why only the *first* node starts — it has no peers to >> gossip with, so it never receives a large payload. >> >> 🟠 *Is this expected behavior in 5.0.x?* >> >> *No.* >> This is not expected, and your analysis is correct: the version‑gating >> logic is flawed for clusters with large SAI metadata sets. >> >> The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap >> ordering problem*. >> >> 🔴 *Is this a known limitation or bug?* >> >> It is *not widely documented*, but it *is* a bug. >> >> You are essentially hitting a *gossip‑serialization deadlock* caused by: >> >> - Large SAI metadata >> - Legacy serialization path >> - Version detection not initialized early enough >> >> Your observation that the fix is “incomplete” is accurate — the gating >> logic should not depend on gossip convergence during bootstrap. >> >> This should be escalated upstream. >> >> 🟢 *Is your workaround valid?* >> >> Yes — forcing the compact format: >> >> return false; >> >> …is the correct behavior for any *homogeneous 5.0.4+ cluster*. >> >> Your patch proves the issue is not data corruption, but *incorrect >> version‑gating logic*. >> >> 🧭 *Recommended Workarounds (Until Upstream Fix)* >> >> *1. Force compact format during startup (your patch)* >> >> This is the cleanest workaround for now. >> >> *2. Start nodes one at a time with gossip disabled* >> >> Not ideal, but possible: >> >> JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false" >> >> Then: >> >> 1. Start node >> 2. Wait for it to settle >> 3. Enable join >> 4. Repeat >> >> This avoids large gossip payloads during initial handshake. >> >> *3. Reduce SAI index metadata temporarily* >> >> If possible: >> >> - Drop unused SAI indexes >> - Reduce index count per table >> - Restart cluster >> - Recreate indexes >> >> Not ideal, but works in emergency situations. >> >> *4. Patch the version‑gating logic upstream* >> >> The correct fix would be: >> >> - If all nodes are ≥ 5.0.3 *by configuration*, use compact format >> immediately >> - Do not rely on gossip convergence to activate the new format >> >> This is exactly what your patch does. >> >> 🧩 *Why this only happens on restart (not initial provisioning)* >> >> During initial provisioning: >> >> - Nodes join one by one >> - Gossip state is small >> - SAI metadata is not yet fully populated >> - No large payloads are exchanged >> >> After provisioning: >> >> - SAI metadata grows >> - Gossip payloads grow >> - Legacy format becomes too large >> - Restart triggers the deadlock >> >> This is consistent with your observations. >> >> 🟦 *What you should do next* >> >> If you want this fixed upstream, you should: >> >> - Open a JIRA ticket referencing CASSANDRA‑20058 >> - Include your analysis + stack trace >> - Include your patch >> - Describe the bootstrap deadlock scenario >> >> This is the kind of issue the Cassandra dev team will want to fix before >> 5.0.x becomes widely adopted. >> >> 🎯 *Short Answers to Your Questions* >> >> *Is this startup/version‑gating behavior expected?* >> >> No — this is unintended behavior. >> >> *Is this a known limitation or bug?* >> >> It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata >> clusters. >> >> *Is there a recommended way to bootstrap or restart clusters in this >> state?* >> >> Workarounds include: >> >> - Forcing compact format (best) >> - Staggered startup with join_ring=false >> - Reducing SAI metadata temporarily >> >> If you want, I can help you: >> >> - Draft the JIRA ticket >> - Write a minimal reproducible test case >> - Produce a clean patch proposal >> - Review the relevant Cassandra code paths with you >> >> Just tell me how deep you want to go. >> >> Thanks & Best Regards >> >> Henry PAN >> Sr. Lead Cloud Architect >> (425) 802--3975 >> https://www.linkedin.com/in/henrypan1 >> >> >> >> On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re >>> seeing and wanted to ask the user list if this behavior is expected or >>> already known. >>> >>> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with >>> a relatively large number of keyspaces, tables, and SAI indexes. On initial >>> cluster creation and provisioning of multiple keyspaces, everything >>> operates as expected. However, after stopping the cluster and restarting >>> all nodes, only the first node comes up successfully. Subsequent nodes fail >>> during startup with an assertion in the gossip thread while serializing the >>> SAI index status metadata. >>> >>> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 >>> - Exception in thread Thread[GossipStage:1,5,GossipStage] >>> java.lang.RuntimeException: java.lang.AssertionError >>> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108) >>> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45) >>> at >>> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) >>> at >>> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) >>> at >>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>> at >>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>> at >>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >>> at java.base/java.lang.Thread.run(Thread.java:834) >>> Caused by: java.lang.AssertionError: null >>> at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44) >>> at >>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381) >>> at >>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359) >>> at >>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344) >>> at >>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300) >>> at >>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96) >>> at >>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61) >>> at >>> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088) >>> at org.apache.cassandra.net.Message.payloadSize(Message.java:1131) >>> at >>> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769) >>> >>> It seems there was a fix to this same issue as reported in this DBA >>> Stack Exchange post >>> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof> >>> ((CASSANDRA-20058 >>> <https://issues.apache.org/jira/browse/CASSANDRA-20058>). It seems to >>> me though that the fix described in that post and ticket, included in >>> Cassandra 5.0.3, is incomplete? From what I can tell, the fix seems to >>> only be activated once the gossip state of the cluster has converged but >>> the error seems to occur before this happens. At the point of the error, >>> the minimum cluster version appears to be treated as unknown, which causes >>> Cassandra to fall back to the legacy (pre-5.0.3) index-status serialization >>> format. In our case, that legacy representation becomes large enough to >>> trigger the assertion, preventing the node from joining. Because the node >>> never joins, gossip never converges, and the newer 5.0.3+ compressed format >>> is never enabled. >>> >>> This effectively leaves the cluster stuck in a startup loop where only >>> the first node can come up. >>> >>> As a sanity check, I locally modified the version-gating logic in >>> *IndexStatusManager.java *for the index-status serialization to always >>> use the newer compact format during startup, and with that change the >>> cluster started successfully. >>> >>> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion >>> minVersion) >>> { >>> return false; // return minVersion == null || (minVersion.major == >>> 5 && minVersion.minor == 0 && minVersion.patch < 3); >>> } >>> >>> This makes me suspect the issue is related to bootstrap ordering or >>> version detection rather than data corruption or configuration. >>> >>> I posted a more detailed write-up >>> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version> >>> (with >>> stack traces and code references) on DBA StackExchange a few weeks ago but >>> haven’t received any feedback yet, so I wanted to ask here: >>> >>> >>> - >>> >>> Is this startup/version-gating behavior expected in 5.0.x? >>> - >>> >>> Is this a known limitation or bug? >>> - >>> >>> Is there a recommended way to bootstrap or restart clusters in this >>> state? >>> >>> Any insight would be appreciated. Happy to provide logs or additional >>> details if helpful. >>> >>> Thanks, >>> >>> Nicholas >>> >>
