Ticket created and assigned, thanks! https://issues.apache.org/jira/browse/CASSANDRA-21132
On Wed, Jan 21, 2026 at 8:15 PM Henry Pan (HP) <[email protected]> wrote: > Good luck > > Thanks & Best Regards > > Henry PAN > Sr. Lead Cloud Architect > (425) 802--3975 > https://www.linkedin.com/in/henrypan1 > > > > On Wed, Jan 21, 2026 at 3:33 PM Ashaman Kingpin <[email protected]> > wrote: > >> Thanks Henry and Caleb. I will create the Jira ticket tomorrow. >> >> On Jan 21, 2026, at 5:34 PM, Caleb Rackliffe <[email protected]> >> wrote: >> >> >> One other thing that might be interesting to clarify is whether this >> occurs on a rolling restart as well as a complete bring-down/bring-up. >> >> On Wed, Jan 21, 2026 at 3:54 PM Caleb Rackliffe <[email protected]> >> wrote: >> >>> Hi Ashaman, >>> >>> Would you be willing to create a Jira with a full description of the >>> problem and assign it to me? (Caleb Rackliffe/maedhroz) >>> >>> Thanks! >>> >>> On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]> >>> wrote: >>> >>>> You’ve written a *very* solid technical analysis already, and what >>>> you’re describing *is not expected behavior in Cassandra 5.0.x*. Based >>>> on the symptoms, stack trace, and the version‑gating logic you inspected, >>>> what you’re hitting is almost certainly a *real bug* in the 5.0.x SAI >>>> gossip serialization path — specifically in the *pre‑convergence >>>> version detection logic*. >>>> >>>> Let me walk you through what’s actually happening, why your cluster >>>> gets stuck, and what the Cassandra community would typically recommend. >>>> >>>> 🧠 *What’s Actually Going Wrong (Root Cause)* >>>> >>>> *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)* >>>> >>>> That fix added a *compressed SAI index‑status gossip format* to avoid >>>> blowing up the gossip payload size. >>>> >>>> *2. But the fix is gated on minVersion* >>>> >>>> The new format is only used when: >>>> >>>> - Gossip has converged >>>> - All nodes report version ≥ 5.0.3 >>>> >>>> *3. During startup, minVersion is null* >>>> >>>> This is the key problem. >>>> >>>> On a cold restart: >>>> >>>> - Nodes have *no* gossip state yet >>>> - minVersion is treated as *unknown* >>>> - Cassandra falls back to the *legacy (pre‑5.0.3) format* >>>> - That format serializes *all* SAI index metadata uncompressed >>>> - With many keyspaces/tables/indexes, the payload becomes huge >>>> - TypeSizes.sizeof() asserts because the payload exceeds the >>>> expected bounds >>>> >>>> *4. The node never joins gossip → gossip never converges → new format >>>> never activates* >>>> >>>> This creates a *deadlock*: >>>> >>>> Needs gossip convergence → to use compact format Needs compact format → >>>> to join gossip >>>> >>>> This is exactly why only the *first* node starts — it has no peers to >>>> gossip with, so it never receives a large payload. >>>> >>>> 🟠 *Is this expected behavior in 5.0.x?* >>>> >>>> *No.* >>>> This is not expected, and your analysis is correct: the version‑gating >>>> logic is flawed for clusters with large SAI metadata sets. >>>> >>>> The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap >>>> ordering problem*. >>>> >>>> 🔴 *Is this a known limitation or bug?* >>>> >>>> It is *not widely documented*, but it *is* a bug. >>>> >>>> You are essentially hitting a *gossip‑serialization deadlock* caused >>>> by: >>>> >>>> - Large SAI metadata >>>> - Legacy serialization path >>>> - Version detection not initialized early enough >>>> >>>> Your observation that the fix is “incomplete” is accurate — the gating >>>> logic should not depend on gossip convergence during bootstrap. >>>> >>>> This should be escalated upstream. >>>> >>>> 🟢 *Is your workaround valid?* >>>> >>>> Yes — forcing the compact format: >>>> >>>> return false; >>>> >>>> …is the correct behavior for any *homogeneous 5.0.4+ cluster*. >>>> >>>> Your patch proves the issue is not data corruption, but *incorrect >>>> version‑gating logic*. >>>> >>>> 🧭 *Recommended Workarounds (Until Upstream Fix)* >>>> >>>> *1. Force compact format during startup (your patch)* >>>> >>>> This is the cleanest workaround for now. >>>> >>>> *2. Start nodes one at a time with gossip disabled* >>>> >>>> Not ideal, but possible: >>>> >>>> JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false" >>>> >>>> Then: >>>> >>>> 1. Start node >>>> 2. Wait for it to settle >>>> 3. Enable join >>>> 4. Repeat >>>> >>>> This avoids large gossip payloads during initial handshake. >>>> >>>> *3. Reduce SAI index metadata temporarily* >>>> >>>> If possible: >>>> >>>> - Drop unused SAI indexes >>>> - Reduce index count per table >>>> - Restart cluster >>>> - Recreate indexes >>>> >>>> Not ideal, but works in emergency situations. >>>> >>>> *4. Patch the version‑gating logic upstream* >>>> >>>> The correct fix would be: >>>> >>>> - If all nodes are ≥ 5.0.3 *by configuration*, use compact format >>>> immediately >>>> - Do not rely on gossip convergence to activate the new format >>>> >>>> This is exactly what your patch does. >>>> >>>> 🧩 *Why this only happens on restart (not initial provisioning)* >>>> >>>> During initial provisioning: >>>> >>>> - Nodes join one by one >>>> - Gossip state is small >>>> - SAI metadata is not yet fully populated >>>> - No large payloads are exchanged >>>> >>>> After provisioning: >>>> >>>> - SAI metadata grows >>>> - Gossip payloads grow >>>> - Legacy format becomes too large >>>> - Restart triggers the deadlock >>>> >>>> This is consistent with your observations. >>>> >>>> 🟦 *What you should do next* >>>> >>>> If you want this fixed upstream, you should: >>>> >>>> - Open a JIRA ticket referencing CASSANDRA‑20058 >>>> - Include your analysis + stack trace >>>> - Include your patch >>>> - Describe the bootstrap deadlock scenario >>>> >>>> This is the kind of issue the Cassandra dev team will want to fix >>>> before 5.0.x becomes widely adopted. >>>> >>>> 🎯 *Short Answers to Your Questions* >>>> >>>> *Is this startup/version‑gating behavior expected?* >>>> >>>> No — this is unintended behavior. >>>> >>>> *Is this a known limitation or bug?* >>>> >>>> It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata >>>> clusters. >>>> >>>> *Is there a recommended way to bootstrap or restart clusters in this >>>> state?* >>>> >>>> Workarounds include: >>>> >>>> - Forcing compact format (best) >>>> - Staggered startup with join_ring=false >>>> - Reducing SAI metadata temporarily >>>> >>>> If you want, I can help you: >>>> >>>> - Draft the JIRA ticket >>>> - Write a minimal reproducible test case >>>> - Produce a clean patch proposal >>>> - Review the relevant Cassandra code paths with you >>>> >>>> Just tell me how deep you want to go. >>>> >>>> Thanks & Best Regards >>>> >>>> Henry PAN >>>> Sr. Lead Cloud Architect >>>> (425) 802--3975 >>>> https://www.linkedin.com/in/henrypan1 >>>> >>>> >>>> >>>> On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin < >>>> [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re >>>>> seeing and wanted to ask the user list if this behavior is expected or >>>>> already known. >>>>> >>>>> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster >>>>> with a relatively large number of keyspaces, tables, and SAI indexes. On >>>>> initial cluster creation and provisioning of multiple keyspaces, >>>>> everything >>>>> operates as expected. However, after stopping the cluster and restarting >>>>> all nodes, only the first node comes up successfully. Subsequent nodes >>>>> fail >>>>> during startup with an assertion in the gossip thread while serializing >>>>> the >>>>> SAI index status metadata. >>>>> >>>>> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 >>>>> JVMStabilityInspector.java:70 - Exception in thread >>>>> Thread[GossipStage:1,5,GossipStage] >>>>> java.lang.RuntimeException: java.lang.AssertionError >>>>> at >>>>> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108) >>>>> at >>>>> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45) >>>>> at >>>>> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) >>>>> at >>>>> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) >>>>> at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>> at >>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>> at >>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >>>>> at java.base/java.lang.Thread.run(Thread.java:834) >>>>> Caused by: java.lang.AssertionError: null >>>>> at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44) >>>>> at >>>>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381) >>>>> at >>>>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359) >>>>> at >>>>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344) >>>>> at >>>>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300) >>>>> at >>>>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96) >>>>> at >>>>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61) >>>>> at >>>>> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088) >>>>> at org.apache.cassandra.net.Message.payloadSize(Message.java:1131) >>>>> at >>>>> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769) >>>>> >>>>> It seems there was a fix to this same issue as reported in this DBA >>>>> Stack Exchange post >>>>> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof> >>>>> ((CASSANDRA-20058 >>>>> <https://issues.apache.org/jira/browse/CASSANDRA-20058>). It seems >>>>> to me though that the fix described in that post and ticket, included in >>>>> Cassandra 5.0.3, is incomplete? From what I can tell, the fix seems to >>>>> only be activated once the gossip state of the cluster has converged but >>>>> the error seems to occur before this happens. At the point of the error, >>>>> the minimum cluster version appears to be treated as unknown, which causes >>>>> Cassandra to fall back to the legacy (pre-5.0.3) index-status >>>>> serialization >>>>> format. In our case, that legacy representation becomes large enough to >>>>> trigger the assertion, preventing the node from joining. Because the node >>>>> never joins, gossip never converges, and the newer 5.0.3+ compressed >>>>> format >>>>> is never enabled. >>>>> >>>>> This effectively leaves the cluster stuck in a startup loop where only >>>>> the first node can come up. >>>>> >>>>> As a sanity check, I locally modified the version-gating logic in >>>>> *IndexStatusManager.java *for the index-status serialization to >>>>> always use the newer compact format during startup, and with that change >>>>> the cluster started successfully. >>>>> >>>>> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion >>>>> minVersion) >>>>> { >>>>> return false; // return minVersion == null || (minVersion.major >>>>> == 5 && minVersion.minor == 0 && minVersion.patch < 3); >>>>> } >>>>> >>>>> This makes me suspect the issue is related to bootstrap ordering or >>>>> version detection rather than data corruption or configuration. >>>>> >>>>> I posted a more detailed write-up >>>>> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version> >>>>> (with >>>>> stack traces and code references) on DBA StackExchange a few weeks ago but >>>>> haven’t received any feedback yet, so I wanted to ask here: >>>>> >>>>> >>>>> - >>>>> >>>>> Is this startup/version-gating behavior expected in 5.0.x? >>>>> - >>>>> >>>>> Is this a known limitation or bug? >>>>> - >>>>> >>>>> Is there a recommended way to bootstrap or restart clusters in >>>>> this state? >>>>> >>>>> Any insight would be appreciated. Happy to provide logs or additional >>>>> details if helpful. >>>>> >>>>> Thanks, >>>>> >>>>> Nicholas >>>>> >>>>
