Thanks Henry and Caleb. I will create the Jira ticket tomorrow.  

On Jan 21, 2026, at 5:34 PM, Caleb Rackliffe <[email protected]> wrote:


One other thing that might be interesting to clarify is whether this occurs on a rolling restart as well as a complete bring-down/bring-up.

On Wed, Jan 21, 2026 at 3:54 PM Caleb Rackliffe <[email protected]> wrote:
Hi Ashaman,

Would you be willing to create a Jira with a full description of the problem and assign it to me? (Caleb Rackliffe/maedhroz)

Thanks!

On Wed, Jan 21, 2026 at 9:14 AM Henry Pan (HP) <[email protected]> wrote:

You’ve written a very solid technical analysis already, and what you’re describing is not expected behavior in Cassandra 5.0.x. Based on the symptoms, stack trace, and the version‑gating logic you inspected, what you’re hitting is almost certainly a real bug in the 5.0.x SAI gossip serialization path — specifically in the pre‑convergence version detection logic.

Let me walk you through what’s actually happening, why your cluster gets stuck, and what the Cassandra community would typically recommend.

🧠 What’s Actually Going Wrong (Root Cause)

1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)

That fix added a compressed SAI index‑status gossip format to avoid blowing up the gossip payload size.

2. But the fix is gated on minVersion

The new format is only used when:

  • Gossip has converged
  • All nodes report version ≥ 5.0.3

3. During startup, minVersion is null

This is the key problem.

On a cold restart:

  • Nodes have no gossip state yet
  • minVersion is treated as unknown
  • Cassandra falls back to the legacy (pre‑5.0.3) format
  • That format serializes all SAI index metadata uncompressed
  • With many keyspaces/tables/indexes, the payload becomes huge
  • TypeSizes.sizeof() asserts because the payload exceeds the expected bounds

4. The node never joins gossip → gossip never converges → new format never activates

This creates a deadlock:

Needs gossip convergence → to use compact format Needs compact format → to join gossip

This is exactly why only the first node starts — it has no peers to gossip with, so it never receives a large payload.

🟠 Is this expected behavior in 5.0.x?

No.
This is not expected, and your analysis is correct: the version‑gating logic is flawed for clusters with large SAI metadata sets.

The fix in 5.0.3 solved the symptom (payload size) but not the bootstrap ordering problem.

🔴 Is this a known limitation or bug?

It is not widely documented, but it is a bug.

You are essentially hitting a gossip‑serialization deadlock caused by:

  • Large SAI metadata
  • Legacy serialization path
  • Version detection not initialized early enough

Your observation that the fix is “incomplete” is accurate — the gating logic should not depend on gossip convergence during bootstrap.

This should be escalated upstream.

🟢 Is your workaround valid?

Yes — forcing the compact format:

return false;

…is the correct behavior for any homogeneous 5.0.4+ cluster.

Your patch proves the issue is not data corruption, but incorrect version‑gating logic.

🧭 Recommended Workarounds (Until Upstream Fix)

1. Force compact format during startup (your patch)

This is the cleanest workaround for now.

2. Start nodes one at a time with gossip disabled

Not ideal, but possible:

JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false"

Then:

  1. Start node
  2. Wait for it to settle
  3. Enable join
  4. Repeat

This avoids large gossip payloads during initial handshake.

3. Reduce SAI index metadata temporarily

If possible:

  • Drop unused SAI indexes
  • Reduce index count per table
  • Restart cluster
  • Recreate indexes

Not ideal, but works in emergency situations.

4. Patch the version‑gating logic upstream

The correct fix would be:

  • If all nodes are ≥ 5.0.3 by configuration, use compact format immediately
  • Do not rely on gossip convergence to activate the new format

This is exactly what your patch does.

🧩 Why this only happens on restart (not initial provisioning)

During initial provisioning:

  • Nodes join one by one
  • Gossip state is small
  • SAI metadata is not yet fully populated
  • No large payloads are exchanged

After provisioning:

  • SAI metadata grows
  • Gossip payloads grow
  • Legacy format becomes too large
  • Restart triggers the deadlock

This is consistent with your observations.

🟦 What you should do next

If you want this fixed upstream, you should:

  • Open a JIRA ticket referencing CASSANDRA‑20058
  • Include your analysis + stack trace
  • Include your patch
  • Describe the bootstrap deadlock scenario

This is the kind of issue the Cassandra dev team will want to fix before 5.0.x becomes widely adopted.

🎯 Short Answers to Your Questions

Is this startup/version‑gating behavior expected?

No — this is unintended behavior.

Is this a known limitation or bug?

It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata clusters.

Is there a recommended way to bootstrap or restart clusters in this state?

Workarounds include:

  • Forcing compact format (best)
  • Staggered startup with join_ring=false
  • Reducing SAI metadata temporarily

If you want, I can help you:

  • Draft the JIRA ticket
  • Write a minimal reproducible test case
  • Produce a clean patch proposal
  • Review the relevant Cassandra code paths with you

Just tell me how deep you want to go.


Thanks & Best Regards
 
Henry PAN
Sr. Lead Cloud Architect
(425) 802--3975 
 


On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]> wrote:

Hi all,

I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re seeing and wanted to ask the user list if this behavior is expected or already known.

We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with a relatively large number of keyspaces, tables, and SAI indexes. On initial cluster creation and provisioning of multiple keyspaces, everything operates as expected. However, after stopping the cluster and restarting all nodes, only the first node comes up successfully. Subsequent nodes fail during startup with an assertion in the gossip thread while serializing the SAI index status metadata.

ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 - Exception in thread Thread[GossipStage:1,5,GossipStage]
java.lang.RuntimeException: java.lang.AssertionError
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
        at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
        at org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.AssertionError: null
        at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
        at org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
        at org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
        at org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
        at org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
        at org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
        at org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
        at org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
        at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
        at org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)

It seems there was a fix to this same issue as reported in this DBA Stack Exchange post ((CASSANDRA-20058).   It seems to me though that the fix described in that post and ticket, included in Cassandra 5.0.3, is incomplete?  From what I can tell, the fix seems to only be activated once the gossip state of the cluster has converged but the error seems to occur before this happens.  At the point of the error, the minimum cluster version appears to be treated as unknown, which causes Cassandra to fall back to the legacy (pre-5.0.3) index-status serialization format. In our case, that legacy representation becomes large enough to trigger the assertion, preventing the node from joining. Because the node never joins, gossip never converges, and the newer 5.0.3+ compressed format is never enabled.

This effectively leaves the cluster stuck in a startup loop where only the first node can come up.

As a sanity check, I locally modified the version-gating logic in IndexStatusManager.java for the index-status serialization to always use the newer compact format during startup, and with that change the cluster started successfully. 

private static boolean shouldWriteLegacyStatusFormat(CassandraVersion minVersion)
    {
        return false; // return minVersion == null || (minVersion.major == 5 && minVersion.minor == 0 && minVersion.patch < 3);
    }

This makes me suspect the issue is related to bootstrap ordering or version detection rather than data corruption or configuration.  

posted a more detailed write-up (with stack traces and code references) on DBA StackExchange a few weeks ago but haven’t received any feedback yet, so I wanted to ask here:

  • Is this startup/version-gating behavior expected in 5.0.x?

  • Is this a known limitation or bug?

  • Is there a recommended way to bootstrap or restart clusters in this state?

Any insight would be appreciated. Happy to provide logs or additional details if helpful.

Thanks,

Nicholas

Reply via email to