Hi Cassandra Community,

I am working with Cassandra 5.0.7 and am noticing unexpected behaviour after 
expecting an ALTER TABLE. While the cluster is under active read/write, I see 
intermittent read failures for a short window after an ALTER TABLE ADD COLUMN. 
I'm trying to determine whether this is expected behaviour or if something is 
wrong with my setup.

Setup

  *   Cassandra 5.0.7, 3 datacenters with one node per DC, table replicated 
across all DCs
  *   Continuous mixed read/write load at QUORUM
  *   A schema change is issued from a separate session while traffic runs: 
ALTER TABLE test.users ADD test_column text;

For a short window after the ALTER, a few hundred reads fail on the node the 
ALTER was being executed on. The client returns:


ERROR (users): Operation failed - received 1 responses and 1 failures: UNKNOWN 
from /<ip of node b>:7000
  ERROR (users): Operation failed - received 1 responses and 2 failures: 
UNKNOWN from /<ip of node b>:7000, UNKNOWN from /<ip of node c>:7000



and the replica logs show:


ERROR [Messaging-EventLoop-3-17] 2026-06-05T09:05:22,990 
InboundMessageHandler.java:180 - /<ip of node a>:7000->/<ip of node 
b>:7000-SMALL_MESSAGES-c828f193 unexpected exception caught while deserializing 
a message
java.lang.RuntimeException: Unknown column test_column during deserialization
    at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:491)
    at 
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserializeRegularAndStaticColumns(ColumnFilter.java:933)
    at 
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:906)
    at 
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:1176)
    ...



The errors stop after a couple hundred milliseconds, once the schema has 
propagated to all nodes I assume.

Notably, this only happens the first time a given column is added. If I drop 
the column and add it again, the errors do not reappear on the subsequent ADD.

My explanation is that the coordinator (which has already applied the schema) 
builds a read that enumerates all of the table's regular columns, including the 
newly added one, (although the new column is not queried in any way and we are 
also not using SELECT *) and forwards that to a replica that hasn't received 
the schema update yet, which then can't deserialize the unknown column name. 
I'm not sure how to explain that re-adding a previously-dropped column doesn't 
reproduce it (possibly something about the column being retained in 
system_schema.dropped_columns or somewhere else after the first cycle?)

I am now wondering if this is expected behavior and if yes, if there is a way 
to circumvent this. I feel like this should be a very common issues users face, 
but I couldn't find any source describing this exact issue so I assume 
something is wrong with my setup.

Thanks in advance for your help!

Reply via email to