Hi Ashaman, Would you be willing to create a Jira with a full description of the problem and assign it to me? (Caleb Rackliffe/maedhroz)
Thanks! On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]> wrote: > You’ve written a *very* solid technical analysis already, and what you’re > describing *is not expected behavior in Cassandra 5.0.x*. Based on the > symptoms, stack trace, and the version‑gating logic you inspected, what > you’re hitting is almost certainly a *real bug* in the 5.0.x SAI gossip > serialization path — specifically in the *pre‑convergence version > detection logic*. > > Let me walk you through what’s actually happening, why your cluster gets > stuck, and what the Cassandra community would typically recommend. > > 🧠 *What’s Actually Going Wrong (Root Cause)* > > *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)* > > That fix added a *compressed SAI index‑status gossip format* to avoid > blowing up the gossip payload size. > > *2. But the fix is gated on minVersion* > > The new format is only used when: > > - Gossip has converged > - All nodes report version ≥ 5.0.3 > > *3. During startup, minVersion is null* > > This is the key problem. > > On a cold restart: > > - Nodes have *no* gossip state yet > - minVersion is treated as *unknown* > - Cassandra falls back to the *legacy (pre‑5.0.3) format* > - That format serializes *all* SAI index metadata uncompressed > - With many keyspaces/tables/indexes, the payload becomes huge > - TypeSizes.sizeof() asserts because the payload exceeds the expected > bounds > > *4. The node never joins gossip → gossip never converges → new format > never activates* > > This creates a *deadlock*: > > Needs gossip convergence → to use compact format Needs compact format → to > join gossip > > This is exactly why only the *first* node starts — it has no peers to > gossip with, so it never receives a large payload. > > 🟠 *Is this expected behavior in 5.0.x?* > > *No.* > This is not expected, and your analysis is correct: the version‑gating > logic is flawed for clusters with large SAI metadata sets. > > The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap > ordering problem*. > > 🔴 *Is this a known limitation or bug?* > > It is *not widely documented*, but it *is* a bug. > > You are essentially hitting a *gossip‑serialization deadlock* caused by: > > - Large SAI metadata > - Legacy serialization path > - Version detection not initialized early enough > > Your observation that the fix is “incomplete” is accurate — the gating > logic should not depend on gossip convergence during bootstrap. > > This should be escalated upstream. > > 🟢 *Is your workaround valid?* > > Yes — forcing the compact format: > > return false; > > …is the correct behavior for any *homogeneous 5.0.4+ cluster*. > > Your patch proves the issue is not data corruption, but *incorrect > version‑gating logic*. > > 🧭 *Recommended Workarounds (Until Upstream Fix)* > > *1. Force compact format during startup (your patch)* > > This is the cleanest workaround for now. > > *2. Start nodes one at a time with gossip disabled* > > Not ideal, but possible: > > JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false" > > Then: > > 1. Start node > 2. Wait for it to settle > 3. Enable join > 4. Repeat > > This avoids large gossip payloads during initial handshake. > > *3. Reduce SAI index metadata temporarily* > > If possible: > > - Drop unused SAI indexes > - Reduce index count per table > - Restart cluster > - Recreate indexes > > Not ideal, but works in emergency situations. > > *4. Patch the version‑gating logic upstream* > > The correct fix would be: > > - If all nodes are ≥ 5.0.3 *by configuration*, use compact format > immediately > - Do not rely on gossip convergence to activate the new format > > This is exactly what your patch does. > > 🧩 *Why this only happens on restart (not initial provisioning)* > > During initial provisioning: > > - Nodes join one by one > - Gossip state is small > - SAI metadata is not yet fully populated > - No large payloads are exchanged > > After provisioning: > > - SAI metadata grows > - Gossip payloads grow > - Legacy format becomes too large > - Restart triggers the deadlock > > This is consistent with your observations. > > 🟦 *What you should do next* > > If you want this fixed upstream, you should: > > - Open a JIRA ticket referencing CASSANDRA‑20058 > - Include your analysis + stack trace > - Include your patch > - Describe the bootstrap deadlock scenario > > This is the kind of issue the Cassandra dev team will want to fix before > 5.0.x becomes widely adopted. > > 🎯 *Short Answers to Your Questions* > > *Is this startup/version‑gating behavior expected?* > > No — this is unintended behavior. > > *Is this a known limitation or bug?* > > It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata > clusters. > > *Is there a recommended way to bootstrap or restart clusters in this > state?* > > Workarounds include: > > - Forcing compact format (best) > - Staggered startup with join_ring=false > - Reducing SAI metadata temporarily > > If you want, I can help you: > > - Draft the JIRA ticket > - Write a minimal reproducible test case > - Produce a clean patch proposal > - Review the relevant Cassandra code paths with you > > Just tell me how deep you want to go. > > Thanks & Best Regards > > Henry PAN > Sr. Lead Cloud Architect > (425) 802--3975 > https://www.linkedin.com/in/henrypan1 > > > > On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]> > wrote: > >> Hi all, >> >> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re >> seeing and wanted to ask the user list if this behavior is expected or >> already known. >> >> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with a >> relatively large number of keyspaces, tables, and SAI indexes. On initial >> cluster creation and provisioning of multiple keyspaces, everything >> operates as expected. However, after stopping the cluster and restarting >> all nodes, only the first node comes up successfully. Subsequent nodes fail >> during startup with an assertion in the gossip thread while serializing the >> SAI index status metadata. >> >> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 >> - Exception in thread Thread[GossipStage:1,5,GossipStage] >> java.lang.RuntimeException: java.lang.AssertionError >> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108) >> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45) >> at >> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) >> at >> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) >> at >> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> at >> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> at >> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >> at java.base/java.lang.Thread.run(Thread.java:834) >> Caused by: java.lang.AssertionError: null >> at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44) >> at >> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381) >> at >> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359) >> at >> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344) >> at >> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300) >> at >> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96) >> at >> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61) >> at >> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088) >> at org.apache.cassandra.net.Message.payloadSize(Message.java:1131) >> at >> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769) >> >> It seems there was a fix to this same issue as reported in this DBA >> Stack Exchange post >> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof> >> ((CASSANDRA-20058 >> <https://issues.apache.org/jira/browse/CASSANDRA-20058>). It seems to >> me though that the fix described in that post and ticket, included in >> Cassandra 5.0.3, is incomplete? From what I can tell, the fix seems to >> only be activated once the gossip state of the cluster has converged but >> the error seems to occur before this happens. At the point of the error, >> the minimum cluster version appears to be treated as unknown, which causes >> Cassandra to fall back to the legacy (pre-5.0.3) index-status serialization >> format. In our case, that legacy representation becomes large enough to >> trigger the assertion, preventing the node from joining. Because the node >> never joins, gossip never converges, and the newer 5.0.3+ compressed format >> is never enabled. >> >> This effectively leaves the cluster stuck in a startup loop where only >> the first node can come up. >> >> As a sanity check, I locally modified the version-gating logic in >> *IndexStatusManager.java *for the index-status serialization to always >> use the newer compact format during startup, and with that change the >> cluster started successfully. >> >> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion >> minVersion) >> { >> return false; // return minVersion == null || (minVersion.major == 5 >> && minVersion.minor == 0 && minVersion.patch < 3); >> } >> >> This makes me suspect the issue is related to bootstrap ordering or >> version detection rather than data corruption or configuration. >> >> I posted a more detailed write-up >> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version> >> (with >> stack traces and code references) on DBA StackExchange a few weeks ago but >> haven’t received any feedback yet, so I wanted to ask here: >> >> >> - >> >> Is this startup/version-gating behavior expected in 5.0.x? >> - >> >> Is this a known limitation or bug? >> - >> >> Is there a recommended way to bootstrap or restart clusters in this >> state? >> >> Any insight would be appreciated. Happy to provide logs or additional >> details if helpful. >> >> Thanks, >> >> Nicholas >> >
