Re: Question about Cassandra 5.0.x startup behavior and gossip version-gating

Caleb Rackliffe Wed, 21 Jan 2026 14:33:57 -0800

One other thing that might be interesting to clarify is whether this occurs
on a rolling restart as well as a complete bring-down/bring-up.


On Wed, Jan 21, 2026 at 3:54 PM Caleb Rackliffe <[email protected]>
wrote:

> Hi Ashaman,
>
> Would you be willing to create a Jira with a full description of the
> problem and assign it to me? (Caleb Rackliffe/maedhroz)
>
> Thanks!
>
> On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]>
> wrote:
>
>> You’ve written a *very* solid technical analysis already, and what
>> you’re describing *is not expected behavior in Cassandra 5.0.x*. Based
>> on the symptoms, stack trace, and the version‑gating logic you inspected,
>> what you’re hitting is almost certainly a *real bug* in the 5.0.x SAI
>> gossip serialization path — specifically in the *pre‑convergence version
>> detection logic*.
>>
>> Let me walk you through what’s actually happening, why your cluster gets
>> stuck, and what the Cassandra community would typically recommend.
>>
>> 🧠 *What’s Actually Going Wrong (Root Cause)*
>>
>> *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)*
>>
>> That fix added a *compressed SAI index‑status gossip format* to avoid
>> blowing up the gossip payload size.
>>
>> *2. But the fix is gated on minVersion*
>>
>> The new format is only used when:
>>
>>    - Gossip has converged
>>    - All nodes report version ≥ 5.0.3
>>
>> *3. During startup, minVersion is null*
>>
>> This is the key problem.
>>
>> On a cold restart:
>>
>>    - Nodes have *no* gossip state yet
>>    - minVersion is treated as *unknown*
>>    - Cassandra falls back to the *legacy (pre‑5.0.3) format*
>>    - That format serializes *all* SAI index metadata uncompressed
>>    - With many keyspaces/tables/indexes, the payload becomes huge
>>    - TypeSizes.sizeof() asserts because the payload exceeds the expected
>>    bounds
>>
>> *4. The node never joins gossip → gossip never converges → new format
>> never activates*
>>
>> This creates a *deadlock*:
>>
>> Needs gossip convergence → to use compact format Needs compact format →
>> to join gossip
>>
>> This is exactly why only the *first* node starts — it has no peers to
>> gossip with, so it never receives a large payload.
>>
>> 🟠 *Is this expected behavior in 5.0.x?*
>>
>> *No.*
>> This is not expected, and your analysis is correct: the version‑gating
>> logic is flawed for clusters with large SAI metadata sets.
>>
>> The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap
>> ordering problem*.
>>
>> 🔴 *Is this a known limitation or bug?*
>>
>> It is *not widely documented*, but it *is* a bug.
>>
>> You are essentially hitting a *gossip‑serialization deadlock* caused by:
>>
>>    - Large SAI metadata
>>    - Legacy serialization path
>>    - Version detection not initialized early enough
>>
>> Your observation that the fix is “incomplete” is accurate — the gating
>> logic should not depend on gossip convergence during bootstrap.
>>
>> This should be escalated upstream.
>>
>> 🟢 *Is your workaround valid?*
>>
>> Yes — forcing the compact format:
>>
>> return false;
>>
>> …is the correct behavior for any *homogeneous 5.0.4+ cluster*.
>>
>> Your patch proves the issue is not data corruption, but *incorrect
>> version‑gating logic*.
>>
>> 🧭 *Recommended Workarounds (Until Upstream Fix)*
>>
>> *1. Force compact format during startup (your patch)*
>>
>> This is the cleanest workaround for now.
>>
>> *2. Start nodes one at a time with gossip disabled*
>>
>> Not ideal, but possible:
>>
>> JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false"
>>
>> Then:
>>
>>    1. Start node
>>    2. Wait for it to settle
>>    3. Enable join
>>    4. Repeat
>>
>> This avoids large gossip payloads during initial handshake.
>>
>> *3. Reduce SAI index metadata temporarily*
>>
>> If possible:
>>
>>    - Drop unused SAI indexes
>>    - Reduce index count per table
>>    - Restart cluster
>>    - Recreate indexes
>>
>> Not ideal, but works in emergency situations.
>>
>> *4. Patch the version‑gating logic upstream*
>>
>> The correct fix would be:
>>
>>    - If all nodes are ≥ 5.0.3 *by configuration*, use compact format
>>    immediately
>>    - Do not rely on gossip convergence to activate the new format
>>
>> This is exactly what your patch does.
>>
>> 🧩 *Why this only happens on restart (not initial provisioning)*
>>
>> During initial provisioning:
>>
>>    - Nodes join one by one
>>    - Gossip state is small
>>    - SAI metadata is not yet fully populated
>>    - No large payloads are exchanged
>>
>> After provisioning:
>>
>>    - SAI metadata grows
>>    - Gossip payloads grow
>>    - Legacy format becomes too large
>>    - Restart triggers the deadlock
>>
>> This is consistent with your observations.
>>
>> 🟦 *What you should do next*
>>
>> If you want this fixed upstream, you should:
>>
>>    - Open a JIRA ticket referencing CASSANDRA‑20058
>>    - Include your analysis + stack trace
>>    - Include your patch
>>    - Describe the bootstrap deadlock scenario
>>
>> This is the kind of issue the Cassandra dev team will want to fix before
>> 5.0.x becomes widely adopted.
>>
>> 🎯 *Short Answers to Your Questions*
>>
>> *Is this startup/version‑gating behavior expected?*
>>
>> No — this is unintended behavior.
>>
>> *Is this a known limitation or bug?*
>>
>> It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata
>> clusters.
>>
>> *Is there a recommended way to bootstrap or restart clusters in this
>> state?*
>>
>> Workarounds include:
>>
>>    - Forcing compact format (best)
>>    - Staggered startup with join_ring=false
>>    - Reducing SAI metadata temporarily
>>
>> If you want, I can help you:
>>
>>    - Draft the JIRA ticket
>>    - Write a minimal reproducible test case
>>    - Produce a clean patch proposal
>>    - Review the relevant Cassandra code paths with you
>>
>> Just tell me how deep you want to go.
>>
>> Thanks & Best Regards
>>
>> Henry PAN
>> Sr. Lead Cloud Architect
>> (425) 802--3975
>> https://www.linkedin.com/in/henrypan1
>>
>>
>>
>> On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re
>>> seeing and wanted to ask the user list if this behavior is expected or
>>> already known.
>>>
>>> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with
>>> a relatively large number of keyspaces, tables, and SAI indexes. On initial
>>> cluster creation and provisioning of multiple keyspaces, everything
>>> operates as expected. However, after stopping the cluster and restarting
>>> all nodes, only the first node comes up successfully. Subsequent nodes fail
>>> during startup with an assertion in the gossip thread while serializing the
>>> SAI index status metadata.
>>>
>>> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 
>>> - Exception in thread Thread[GossipStage:1,5,GossipStage]
>>> java.lang.RuntimeException: java.lang.AssertionError
>>>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>>>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>>>         at 
>>> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>>>         at 
>>> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>>>         at 
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>         at 
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>         at 
>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>> Caused by: java.lang.AssertionError: null
>>>         at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
>>>         at 
>>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
>>>         at 
>>> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
>>>         at 
>>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
>>>         at 
>>> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
>>>         at 
>>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
>>>         at 
>>> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
>>>         at 
>>> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
>>>         at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
>>>         at 
>>> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
>>>
>>> It seems there was a fix to this same issue as reported in this DBA
>>> Stack Exchange post
>>> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof>
>>>  ((CASSANDRA-20058
>>> <https://issues.apache.org/jira/browse/CASSANDRA-20058>).   It seems to
>>> me though that the fix described in that post and ticket, included in
>>> Cassandra 5.0.3, is incomplete?  From what I can tell, the fix seems to
>>> only be activated once the gossip state of the cluster has converged but
>>> the error seems to occur before this happens.  At the point of the error,
>>> the minimum cluster version appears to be treated as unknown, which causes
>>> Cassandra to fall back to the legacy (pre-5.0.3) index-status serialization
>>> format. In our case, that legacy representation becomes large enough to
>>> trigger the assertion, preventing the node from joining. Because the node
>>> never joins, gossip never converges, and the newer 5.0.3+ compressed format
>>> is never enabled.
>>>
>>> This effectively leaves the cluster stuck in a startup loop where only
>>> the first node can come up.
>>>
>>> As a sanity check, I locally modified the version-gating logic in
>>> *IndexStatusManager.java *for the index-status serialization to always
>>> use the newer compact format during startup, and with that change the
>>> cluster started successfully.
>>>
>>> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion 
>>> minVersion)
>>>     {
>>>         return false; // return minVersion == null || (minVersion.major == 
>>> 5 && minVersion.minor == 0 && minVersion.patch < 3);
>>>     }
>>>
>>> This makes me suspect the issue is related to bootstrap ordering or
>>> version detection rather than data corruption or configuration.
>>>
>>> I posted a more detailed write-up
>>> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version>
>>>  (with
>>> stack traces and code references) on DBA StackExchange a few weeks ago but
>>> haven’t received any feedback yet, so I wanted to ask here:
>>>
>>>
>>>    -
>>>
>>>    Is this startup/version-gating behavior expected in 5.0.x?
>>>    -
>>>
>>>    Is this a known limitation or bug?
>>>    -
>>>
>>>    Is there a recommended way to bootstrap or restart clusters in this
>>>    state?
>>>
>>> Any insight would be appreciated. Happy to provide logs or additional
>>> details if helpful.
>>>
>>> Thanks,
>>>
>>> Nicholas
>>>
>>

Re: Question about Cassandra 5.0.x startup behavior and gossip version-gating

Reply via email to