Re: [PR] CASSANDRA-21147 - Direct IO support for Cursor Compaction [cassandra]

via GitHub Tue, 17 Feb 2026 05:08:31 -0800


samueldlightfoot commented on code in PR #4606:
URL: https://github.com/apache/cassandra/pull/4606#discussion_r2816736324



##########
src/java/org/apache/cassandra/db/compaction/CursorCompactor.java:
##########
@@ -1553,6 +1547,33 @@ private static String mergeHistogramToString(long[] 
histogram)
         return sb.toString();
     }
 
+    /**
+     * Closes scanner-opened readers before opening cursor-specific readers 
with the configured disk access mode.
+     * In cursor compaction, scanners are only used for metadata; closing them 
avoids holding redundant file
+     * descriptors and prevents conflicts when scan and non-scan readers for 
the same file share thread-local
+     * buffer state on the same thread.
+     */
+    private static StatefulCursor[] 
convertScannersToCursors(List<ISSTableScanner> scanners, 
ImmutableSet<SSTableReader> sstables,

Review Comment:
   Agree there's a smell - the cursor compaction code has been tailored to fit 
the existing compaction interface and accept scanners and pull just metadata 
from them, whereas raw SSTables would have suited better. The ideal refactor I 
see is to let each separate compaction pipeline open the readers it requires, 
but this may be better done in a follow-up.
   
   The disk access mode is not actually the issue here, it is the fact two 
readers (one **scan** and one **non-scan**) exist for an SSTable at the same 
time on the same thread. On trunk this is a bug causing cursor compaction to 
use scan reads (with a read-ahead buffer) rather than random access reads 
(intended), due to ScanCompressedReader's allocated() check being based on 
shared static state, looking like the below
   
   ```
           @Override
           public boolean allocated()
           {
               // this checks the static block map, which is inherently shared 
between the two 
               // readers (scan + non-scan)
               return readAheadBuffer.hasBuffer(); 
           }
   ```
   
   Closing the scanner before opening the cursor reader deallocates the block 
map and thus allocated() returns false when opening the cursor readers, leading 
to the correct random access reader 'reader' being chosen:
   
   ```
   // How readChunk picks the reader (scan vs random)
   CompressedReader readFrom = (scanReader != null && scanReader.allocated()) ? 
scanReader : reader;
   ```
   
   This is perhaps slightly brittle from CompressedChunkReader's allocated() 
perspective, but the precondition that a given file is not opened by two 
readers from the same thread concurrently does not seem unreasonable (like it 
is currently in cursor compaction). I did look into guarding instantiation for 
the same file but it caused a large number of test failures (I cannot remember 
the full details).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-21147 - Direct IO support for Cursor Compaction [cassandra]

Reply via email to