Good luck Thanks & Best Regards
Henry PAN Sr. Lead Cloud Architect (425) 802--3975 https://www.linkedin.com/in/henrypan1 On Wed, Jan 21, 2026 at 3:33 PM Ashaman Kingpin <[email protected]> wrote: > Thanks Henry and Caleb. I will create the Jira ticket tomorrow. > > On Jan 21, 2026, at 5:34 PM, Caleb Rackliffe <[email protected]> > wrote: > > > One other thing that might be interesting to clarify is whether this > occurs on a rolling restart as well as a complete bring-down/bring-up. > > On Wed, Jan 21, 2026 at 3:54 PM Caleb Rackliffe <[email protected]> > wrote: > >> Hi Ashaman, >> >> Would you be willing to create a Jira with a full description of the >> problem and assign it to me? (Caleb Rackliffe/maedhroz) >> >> Thanks! >> >> On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]> >> wrote: >> >>> You’ve written a *very* solid technical analysis already, and what >>> you’re describing *is not expected behavior in Cassandra 5.0.x*. Based >>> on the symptoms, stack trace, and the version‑gating logic you inspected, >>> what you’re hitting is almost certainly a *real bug* in the 5.0.x SAI >>> gossip serialization path — specifically in the *pre‑convergence >>> version detection logic*. >>> >>> Let me walk you through what’s actually happening, why your cluster gets >>> stuck, and what the Cassandra community would typically recommend. >>> >>> 🧠 *What’s Actually Going Wrong (Root Cause)* >>> >>> *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)* >>> >>> That fix added a *compressed SAI index‑status gossip format* to avoid >>> blowing up the gossip payload size. >>> >>> *2. But the fix is gated on minVersion* >>> >>> The new format is only used when: >>> >>> - Gossip has converged >>> - All nodes report version ≥ 5.0.3 >>> >>> *3. During startup, minVersion is null* >>> >>> This is the key problem. >>> >>> On a cold restart: >>> >>> - Nodes have *no* gossip state yet >>> - minVersion is treated as *unknown* >>> - Cassandra falls back to the *legacy (pre‑5.0.3) format* >>> - That format serializes *all* SAI index metadata uncompressed >>> - With many keyspaces/tables/indexes, the payload becomes huge >>> - TypeSizes.sizeof() asserts because the payload exceeds the >>> expected bounds >>> >>> *4. The node never joins gossip → gossip never converges → new format >>> never activates* >>> >>> This creates a *deadlock*: >>> >>> Needs gossip convergence → to use compact format Needs compact format → >>> to join gossip >>> >>> This is exactly why only the *first* node starts — it has no peers to >>> gossip with, so it never receives a large payload. >>> >>> 🟠 *Is this expected behavior in 5.0.x?* >>> >>> *No.* >>> This is not expected, and your analysis is correct: the version‑gating >>> logic is flawed for clusters with large SAI metadata sets. >>> >>> The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap >>> ordering problem*. >>> >>> 🔴 *Is this a known limitation or bug?* >>> >>> It is *not widely documented*, but it *is* a bug. >>> >>> You are essentially hitting a *gossip‑serialization deadlock* caused by: >>> >>> - Large SAI metadata >>> - Legacy serialization path >>> - Version detection not initialized early enough >>> >>> Your observation that the fix is “incomplete” is accurate — the gating >>> logic should not depend on gossip convergence during bootstrap. >>> >>> This should be escalated upstream. >>> >>> 🟢 *Is your workaround valid?* >>> >>> Yes — forcing the compact format: >>> >>> return false; >>> >>> …is the correct behavior for any *homogeneous 5.0.4+ cluster*. >>> >>> Your patch proves the issue is not data corruption, but *incorrect >>> version‑gating logic*. >>> >>> 🧭 *Recommended Workarounds (Until Upstream Fix)* >>> >>> *1. Force compact format during startup (your patch)* >>> >>> This is the cleanest workaround for now. >>> >>> *2. Start nodes one at a time with gossip disabled* >>> >>> Not ideal, but possible: >>> >>> JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false" >>> >>> Then: >>> >>> 1. Start node >>> 2. Wait for it to settle >>> 3. Enable join >>> 4. Repeat >>> >>> This avoids large gossip payloads during initial handshake. >>> >>> *3. Reduce SAI index metadata temporarily* >>> >>> If possible: >>> >>> - Drop unused SAI indexes >>> - Reduce index count per table >>> - Restart cluster >>> - Recreate indexes >>> >>> Not ideal, but works in emergency situations. >>> >>> *4. Patch the version‑gating logic upstream* >>> >>> The correct fix would be: >>> >>> - If all nodes are ≥ 5.0.3 *by configuration*, use compact format >>> immediately >>> - Do not rely on gossip convergence to activate the new format >>> >>> This is exactly what your patch does. >>> >>> 🧩 *Why this only happens on restart (not initial provisioning)* >>> >>> During initial provisioning: >>> >>> - Nodes join one by one >>> - Gossip state is small >>> - SAI metadata is not yet fully populated >>> - No large payloads are exchanged >>> >>> After provisioning: >>> >>> - SAI metadata grows >>> - Gossip payloads grow >>> - Legacy format becomes too large >>> - Restart triggers the deadlock >>> >>> This is consistent with your observations. >>> >>> 🟦 *What you should do next* >>> >>> If you want this fixed upstream, you should: >>> >>> - Open a JIRA ticket referencing CASSANDRA‑20058 >>> - Include your analysis + stack trace >>> - Include your patch >>> - Describe the bootstrap deadlock scenario >>> >>> This is the kind of issue the Cassandra dev team will want to fix before >>> 5.0.x becomes widely adopted. >>> >>> 🎯 *Short Answers to Your Questions* >>> >>> *Is this startup/version‑gating behavior expected?* >>> >>> No — this is unintended behavior. >>> >>> *Is this a known limitation or bug?* >>> >>> It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata >>> clusters. >>> >>> *Is there a recommended way to bootstrap or restart clusters in this >>> state?* >>> >>> Workarounds include: >>> >>> - Forcing compact format (best) >>> - Staggered startup with join_ring=false >>> - Reducing SAI metadata temporarily >>> >>> If you want, I can help you: >>> >>> - Draft the JIRA ticket >>> - Write a minimal reproducible test case >>> - Produce a clean patch proposal >>> - Review the relevant Cassandra code paths with you >>> >>> Just tell me how deep you want to go. >>> >>> Thanks & Best Regards >>> >>> Henry PAN >>> Sr. Lead Cloud Architect >>> (425) 802--3975 >>> https://www.linkedin.com/in/henrypan1 >>> >>> >>> >>> On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re >>>> seeing and wanted to ask the user list if this behavior is expected or >>>> already known. >>>> >>>> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with >>>> a relatively large number of keyspaces, tables, and SAI indexes. On initial >>>> cluster creation and provisioning of multiple keyspaces, everything >>>> operates as expected. However, after stopping the cluster and restarting >>>> all nodes, only the first node comes up successfully. Subsequent nodes fail >>>> during startup with an assertion in the gossip thread while serializing the >>>> SAI index status metadata. >>>> >>>> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 >>>> JVMStabilityInspector.java:70 - Exception in thread >>>> Thread[GossipStage:1,5,GossipStage] >>>> java.lang.RuntimeException: java.lang.AssertionError >>>> at >>>> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108) >>>> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45) >>>> at >>>> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) >>>> at >>>> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) >>>> at >>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>> at >>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>> at >>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >>>> at java.base/java.lang.Thread.run(Thread.java:834) >>>> Caused by: java.lang.AssertionError: null >>>> at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44) >>>> at >>>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381) >>>> at >>>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359) >>>> at >>>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344) >>>> at >>>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300) >>>> at >>>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96) >>>> at >>>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61) >>>> at >>>> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088) >>>> at org.apache.cassandra.net.Message.payloadSize(Message.java:1131) >>>> at >>>> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769) >>>> >>>> It seems there was a fix to this same issue as reported in this DBA >>>> Stack Exchange post >>>> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof> >>>> ((CASSANDRA-20058 >>>> <https://issues.apache.org/jira/browse/CASSANDRA-20058>). It seems >>>> to me though that the fix described in that post and ticket, included in >>>> Cassandra 5.0.3, is incomplete? From what I can tell, the fix seems to >>>> only be activated once the gossip state of the cluster has converged but >>>> the error seems to occur before this happens. At the point of the error, >>>> the minimum cluster version appears to be treated as unknown, which causes >>>> Cassandra to fall back to the legacy (pre-5.0.3) index-status serialization >>>> format. In our case, that legacy representation becomes large enough to >>>> trigger the assertion, preventing the node from joining. Because the node >>>> never joins, gossip never converges, and the newer 5.0.3+ compressed format >>>> is never enabled. >>>> >>>> This effectively leaves the cluster stuck in a startup loop where only >>>> the first node can come up. >>>> >>>> As a sanity check, I locally modified the version-gating logic in >>>> *IndexStatusManager.java *for the index-status serialization to always >>>> use the newer compact format during startup, and with that change the >>>> cluster started successfully. >>>> >>>> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion >>>> minVersion) >>>> { >>>> return false; // return minVersion == null || (minVersion.major == >>>> 5 && minVersion.minor == 0 && minVersion.patch < 3); >>>> } >>>> >>>> This makes me suspect the issue is related to bootstrap ordering or >>>> version detection rather than data corruption or configuration. >>>> >>>> I posted a more detailed write-up >>>> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version> >>>> (with >>>> stack traces and code references) on DBA StackExchange a few weeks ago but >>>> haven’t received any feedback yet, so I wanted to ask here: >>>> >>>> >>>> - >>>> >>>> Is this startup/version-gating behavior expected in 5.0.x? >>>> - >>>> >>>> Is this a known limitation or bug? >>>> - >>>> >>>> Is there a recommended way to bootstrap or restart clusters in this >>>> state? >>>> >>>> Any insight would be appreciated. Happy to provide logs or additional >>>> details if helpful. >>>> >>>> Thanks, >>>> >>>> Nicholas >>>> >>>
