Hi,

Today, in our production, we came across the following scenario:

   1. We have 100 nodes of the Cassandra cluster on 4.0.6, and our client
   uses PreparedStatement, say, "*SELECT * FROM T1 WHERE PK=?*"
   2. We applied a schema change to add a *regular* column, "*ALTER TABLE
   T1 ADD COLUMN c1 UUID"*
   3. Around 20 (out of 100) of the Cassandra nodes started throwing the
   error: "*Successfully prepared, but could not find prepared statement
   for* "  (Code path: QueryEvents.java
   
<https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/cql3/QueryEvents.java#L225C80-L225C145>
   )
   4. The error continued for around 2 hours, and only restarting those 20
   nodes resolved the issue.

Our current hypothesis is that there is a race condition in the
QueryProcessor::prepare
<https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575>
API in that one thread is evicting the prepared statements while the other
is adding, which is never-ending. A similar hypothesis has been mentioned
in a ticket in 2022: https://issues.apache.org/jira/browse/CASSANDRA-17401

Has anyone ever experienced this? Are there any quick pointers on what
could have gone wrong?

Thanks in advance!

Jaydeep

Reply via email to