Hi Cassandra Community,
I am working with Cassandra 5.0.7 and am noticing unexpected behaviour after
expecting an ALTER TABLE. While the cluster is under active read/write, I see
intermittent read failures for a short window after an ALTER TABLE ADD COLUMN.
I'm trying to determine whether this is expected behaviour or if something is
wrong with my setup.
Setup
* Cassandra 5.0.7, 3 datacenters with one node per DC, table replicated
across all DCs
* Continuous mixed read/write load at QUORUM
* A schema change is issued from a separate session while traffic runs:
ALTER TABLE test.users ADD test_column text;
For a short window after the ALTER, a few hundred reads fail on the node the
ALTER was being executed on. The client returns:
ERROR (users): Operation failed - received 1 responses and 1 failures: UNKNOWN
from /<ip of node b>:7000
ERROR (users): Operation failed - received 1 responses and 2 failures:
UNKNOWN from /<ip of node b>:7000, UNKNOWN from /<ip of node c>:7000
and the replica logs show:
ERROR [Messaging-EventLoop-3-17] 2026-06-05T09:05:22,990
InboundMessageHandler.java:180 - /<ip of node a>:7000->/<ip of node
b>:7000-SMALL_MESSAGES-c828f193 unexpected exception caught while deserializing
a message
java.lang.RuntimeException: Unknown column test_column during deserialization
at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:491)
at
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserializeRegularAndStaticColumns(ColumnFilter.java:933)
at
org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:906)
at
org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:1176)
...
The errors stop after a couple hundred milliseconds, once the schema has
propagated to all nodes I assume.
Notably, this only happens the first time a given column is added. If I drop
the column and add it again, the errors do not reappear on the subsequent ADD.
My explanation is that the coordinator (which has already applied the schema)
builds a read that enumerates all of the table's regular columns, including the
newly added one, (although the new column is not queried in any way and we are
also not using SELECT *) and forwards that to a replica that hasn't received
the schema update yet, which then can't deserialize the unknown column name.
I'm not sure how to explain that re-adding a previously-dropped column doesn't
reproduce it (possibly something about the column being retained in
system_schema.dropped_columns or somewhere else after the first cycle?)
I am now wondering if this is expected behavior and if yes, if there is a way
to circumvent this. I feel like this should be a very common issues users face,
but I couldn't find any source describing this exact issue so I assume
something is wrong with my setup.
Thanks in advance for your help!