Hi Ashaman,

Would you be willing to create a Jira with a full description of the
problem and assign it to me? (Caleb Rackliffe/maedhroz)

Thanks!

On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]> wrote:

> You’ve written a *very* solid technical analysis already, and what you’re
> describing *is not expected behavior in Cassandra 5.0.x*. Based on the
> symptoms, stack trace, and the version‑gating logic you inspected, what
> you’re hitting is almost certainly a *real bug* in the 5.0.x SAI gossip
> serialization path — specifically in the *pre‑convergence version
> detection logic*.
>
> Let me walk you through what’s actually happening, why your cluster gets
> stuck, and what the Cassandra community would typically recommend.
>
> 🧠 *What’s Actually Going Wrong (Root Cause)*
>
> *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)*
>
> That fix added a *compressed SAI index‑status gossip format* to avoid
> blowing up the gossip payload size.
>
> *2. But the fix is gated on minVersion*
>
> The new format is only used when:
>
>    - Gossip has converged
>    - All nodes report version ≥ 5.0.3
>
> *3. During startup, minVersion is null*
>
> This is the key problem.
>
> On a cold restart:
>
>    - Nodes have *no* gossip state yet
>    - minVersion is treated as *unknown*
>    - Cassandra falls back to the *legacy (pre‑5.0.3) format*
>    - That format serializes *all* SAI index metadata uncompressed
>    - With many keyspaces/tables/indexes, the payload becomes huge
>    - TypeSizes.sizeof() asserts because the payload exceeds the expected
>    bounds
>
> *4. The node never joins gossip → gossip never converges → new format
> never activates*
>
> This creates a *deadlock*:
>
> Needs gossip convergence → to use compact format Needs compact format → to
> join gossip
>
> This is exactly why only the *first* node starts — it has no peers to
> gossip with, so it never receives a large payload.
>
> 🟠 *Is this expected behavior in 5.0.x?*
>
> *No.*
> This is not expected, and your analysis is correct: the version‑gating
> logic is flawed for clusters with large SAI metadata sets.
>
> The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap
> ordering problem*.
>
> 🔴 *Is this a known limitation or bug?*
>
> It is *not widely documented*, but it *is* a bug.
>
> You are essentially hitting a *gossip‑serialization deadlock* caused by:
>
>    - Large SAI metadata
>    - Legacy serialization path
>    - Version detection not initialized early enough
>
> Your observation that the fix is “incomplete” is accurate — the gating
> logic should not depend on gossip convergence during bootstrap.
>
> This should be escalated upstream.
>
> 🟢 *Is your workaround valid?*
>
> Yes — forcing the compact format:
>
> return false;
>
> …is the correct behavior for any *homogeneous 5.0.4+ cluster*.
>
> Your patch proves the issue is not data corruption, but *incorrect
> version‑gating logic*.
>
> 🧭 *Recommended Workarounds (Until Upstream Fix)*
>
> *1. Force compact format during startup (your patch)*
>
> This is the cleanest workaround for now.
>
> *2. Start nodes one at a time with gossip disabled*
>
> Not ideal, but possible:
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false"
>
> Then:
>
>    1. Start node
>    2. Wait for it to settle
>    3. Enable join
>    4. Repeat
>
> This avoids large gossip payloads during initial handshake.
>
> *3. Reduce SAI index metadata temporarily*
>
> If possible:
>
>    - Drop unused SAI indexes
>    - Reduce index count per table
>    - Restart cluster
>    - Recreate indexes
>
> Not ideal, but works in emergency situations.
>
> *4. Patch the version‑gating logic upstream*
>
> The correct fix would be:
>
>    - If all nodes are ≥ 5.0.3 *by configuration*, use compact format
>    immediately
>    - Do not rely on gossip convergence to activate the new format
>
> This is exactly what your patch does.
>
> 🧩 *Why this only happens on restart (not initial provisioning)*
>
> During initial provisioning:
>
>    - Nodes join one by one
>    - Gossip state is small
>    - SAI metadata is not yet fully populated
>    - No large payloads are exchanged
>
> After provisioning:
>
>    - SAI metadata grows
>    - Gossip payloads grow
>    - Legacy format becomes too large
>    - Restart triggers the deadlock
>
> This is consistent with your observations.
>
> 🟦 *What you should do next*
>
> If you want this fixed upstream, you should:
>
>    - Open a JIRA ticket referencing CASSANDRA‑20058
>    - Include your analysis + stack trace
>    - Include your patch
>    - Describe the bootstrap deadlock scenario
>
> This is the kind of issue the Cassandra dev team will want to fix before
> 5.0.x becomes widely adopted.
>
> 🎯 *Short Answers to Your Questions*
>
> *Is this startup/version‑gating behavior expected?*
>
> No — this is unintended behavior.
>
> *Is this a known limitation or bug?*
>
> It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata
> clusters.
>
> *Is there a recommended way to bootstrap or restart clusters in this
> state?*
>
> Workarounds include:
>
>    - Forcing compact format (best)
>    - Staggered startup with join_ring=false
>    - Reducing SAI metadata temporarily
>
> If you want, I can help you:
>
>    - Draft the JIRA ticket
>    - Write a minimal reproducible test case
>    - Produce a clean patch proposal
>    - Review the relevant Cassandra code paths with you
>
> Just tell me how deep you want to go.
>
> Thanks & Best Regards
>
> Henry PAN
> Sr. Lead Cloud Architect
> (425) 802--3975
> https://www.linkedin.com/in/henrypan1
>
>
>
> On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re
>> seeing and wanted to ask the user list if this behavior is expected or
>> already known.
>>
>> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with a
>> relatively large number of keyspaces, tables, and SAI indexes. On initial
>> cluster creation and provisioning of multiple keyspaces, everything
>> operates as expected. However, after stopping the cluster and restarting
>> all nodes, only the first node comes up successfully. Subsequent nodes fail
>> during startup with an assertion in the gossip thread while serializing the
>> SAI index status metadata.
>>
>> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 
>> - Exception in thread Thread[GossipStage:1,5,GossipStage]
>> java.lang.RuntimeException: java.lang.AssertionError
>>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>>         at 
>> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>>         at 
>> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>>         at 
>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>         at 
>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>         at 
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>         at java.base/java.lang.Thread.run(Thread.java:834)
>> Caused by: java.lang.AssertionError: null
>>         at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
>>         at 
>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
>>         at 
>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
>>         at 
>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
>>         at 
>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
>>         at 
>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
>>         at 
>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
>>         at 
>> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
>>         at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
>>         at 
>> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
>>
>> It seems there was a fix to this same issue as reported in this DBA
>> Stack Exchange post
>> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof>
>>  ((CASSANDRA-20058
>> <https://issues.apache.org/jira/browse/CASSANDRA-20058>).   It seems to
>> me though that the fix described in that post and ticket, included in
>> Cassandra 5.0.3, is incomplete?  From what I can tell, the fix seems to
>> only be activated once the gossip state of the cluster has converged but
>> the error seems to occur before this happens.  At the point of the error,
>> the minimum cluster version appears to be treated as unknown, which causes
>> Cassandra to fall back to the legacy (pre-5.0.3) index-status serialization
>> format. In our case, that legacy representation becomes large enough to
>> trigger the assertion, preventing the node from joining. Because the node
>> never joins, gossip never converges, and the newer 5.0.3+ compressed format
>> is never enabled.
>>
>> This effectively leaves the cluster stuck in a startup loop where only
>> the first node can come up.
>>
>> As a sanity check, I locally modified the version-gating logic in
>> *IndexStatusManager.java *for the index-status serialization to always
>> use the newer compact format during startup, and with that change the
>> cluster started successfully.
>>
>> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion 
>> minVersion)
>>     {
>>         return false; // return minVersion == null || (minVersion.major == 5 
>> && minVersion.minor == 0 && minVersion.patch < 3);
>>     }
>>
>> This makes me suspect the issue is related to bootstrap ordering or
>> version detection rather than data corruption or configuration.
>>
>> I posted a more detailed write-up
>> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version>
>>  (with
>> stack traces and code references) on DBA StackExchange a few weeks ago but
>> haven’t received any feedback yet, so I wanted to ask here:
>>
>>
>>    -
>>
>>    Is this startup/version-gating behavior expected in 5.0.x?
>>    -
>>
>>    Is this a known limitation or bug?
>>    -
>>
>>    Is there a recommended way to bootstrap or restart clusters in this
>>    state?
>>
>> Any insight would be appreciated. Happy to provide logs or additional
>> details if helpful.
>>
>> Thanks,
>>
>> Nicholas
>>
>

Reply via email to