tolbertam commented on code in PR #3917:
URL: https://github.com/apache/cassandra/pull/3917#discussion_r2076730354


##########
src/java/org/apache/cassandra/db/SystemKeyspace.java:
##########
@@ -1863,16 +1864,37 @@ public static void resetPreparedStatements()
     }
 
     public static int loadPreparedStatements(TriFunction<MD5Digest, String, 
String, Boolean> onLoaded)
+    {
+        return loadPreparedStatements(onLoaded, 
QueryProcessor.PRELOAD_PREPARED_STATEMENTS_FETCH_SIZE);
+    }
+
+    public static int loadPreparedStatements(TriFunction<MD5Digest, String, 
String, Boolean> onLoaded, int pageSize)
     {
         String query = String.format("SELECT prepared_id, logged_keyspace, 
query_string FROM %s.%s", SchemaConstants.SYSTEM_KEYSPACE_NAME, 
PREPARED_STATEMENTS);
-        UntypedResultSet resultSet = executeOnceInternal(query);
+        UntypedResultSet resultSet = executeOnceInternalWithPaging(query, 
pageSize);
         int counter = 0;
+        long initialStatementsEvicted = 
QueryProcessor.metrics.preparedStatementsEvicted.getCount();
         for (UntypedResultSet.Row row : resultSet)
         {
             if 
(onLoaded.accept(MD5Digest.wrap(row.getByteArray("prepared_id")),
                                 row.getString("query_string"),
                                 row.has("logged_keyspace") ? 
row.getString("logged_keyspace") : null))
                 counter++;
+
+            if (counter % pageSize == 0 && 
QueryProcessor.isPreparedStatementCacheFull())
+            {
+                // In the event that we detect that the prepared statement 
cache has filled up, return early to prevent

Review Comment:
   > Aren't we missing a test we stop loading pstmnts once cache is full? Maybe 
you could use byteman to capture the call to the log warn method. That with a 
modified yaml to have a smaller pstmnts cache somehow should do the trick. I 
think it's one important feature we're not testing unless I am missing sthg.
   
   I'm glad you brought this up as it pushed me to exercise how effective this 
logic could be, and apparently it was not very effective as is!
   
   We deployed an early version of this fix in late 2024/early 2025, at which 
time we didn't have any short circuiting logic, so we would just page 
indefinitely (which was better than OOMing but still not great).  When we 
encountered this, we would simply just truncate the 
`system.prepared_statements` table.  So this particular piece of logic hadn't 
really been put through its paces.
   
   After writing a test (`testPreloadPreparedStatementsUntilCacheFull`) I found 
that it was *very* difficult to encounter a full cache as Caffeine, while it 
allows going past the maximum weight, is very effective at evicting quickly 
below it.  Therefore, I think we need to add some kind of negative buffer.   I 
found that testing against 99% of the cache size was very effective.
   
   To avoid potentially logging that there might be a leak when we previously 
just had a nearly full cache, we now allow paging one addition time 
(`possibleLeakDetected`), at which point if we still haven't exhausted the 
result set, we'll log the message and exit earlier.
   
   In terms of validating where we stop once the cache is full, I found that we 
could easily detect this by updating `preloadPreparedStatements` to return an 
`int` and then comparing the total number of rows returned against the maximum 
amount we should expect (I added some github comments to the test I added to 
expand on this more).  I think this is more precise than testing against the 
log entry, which also does indeed get logged in the test:
   
   > WARN  [main] 2025-05-06 22:36:42,581 SystemKeyspace.java:1888 - Detected 
prepared statement cache filling up during preload after preparing 492 
statements. This could be an indication that prepared statements leaked prior 
to CASSANDRA-19703 being fixed. Returning early to prevent indefinite startup. 
Consider truncating system.prepared_statements to clear out leaked prepared 
statements.
   
   Hopefully that all sounds right!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Reply via email to