samueldlightfoot commented on code in PR #4606:
URL: https://github.com/apache/cassandra/pull/4606#discussion_r2816736324


##########
src/java/org/apache/cassandra/db/compaction/CursorCompactor.java:
##########
@@ -1553,6 +1547,33 @@ private static String mergeHistogramToString(long[] 
histogram)
         return sb.toString();
     }
 
+    /**
+     * Closes scanner-opened readers before opening cursor-specific readers 
with the configured disk access mode.
+     * In cursor compaction, scanners are only used for metadata; closing them 
avoids holding redundant file
+     * descriptors and prevents conflicts when scan and non-scan readers for 
the same file share thread-local
+     * buffer state on the same thread.
+     */
+    private static StatefulCursor[] 
convertScannersToCursors(List<ISSTableScanner> scanners, 
ImmutableSet<SSTableReader> sstables,

Review Comment:
   Agree there's a smell - the cursor compaction code has been tailored to fit 
the existing compaction interface and accept scanners and pull just metadata 
from them, whereas raw SSTables would have suited better. The changes to do 
this would sprawl a few classes.
   
   The disk access mode is not actually the issue here, it is the fact two 
readers (one **scan** and one **non-scan**) exist for an SSTable at the same 
time on the same thread. On trunk this is a bug causing cursor compaction to 
use scan reads (with a read-ahead buffer) rather than random access reads 
(intended), due to ScanCompressedReader's allocated() check being based on 
shared static state, looking like the below
   
   ```
           @Override
           public boolean allocated()
           {
               // this checks the static block map, which is inherently shared 
between the two 
               // readers (scan + non-scan)
               return readAheadBuffer.hasBuffer(); 
           }
   ```
   
   Closing the scanner before opening the cursor reader deallocates the block 
map as thus allocated() returns false when opening the cursor readers:
   
   ```
   // How readChunk picks the reader (scan vs random)
   CompressedReader readFrom = (scanReader != null && scanReader.allocated()) ? 
scanReader : reader;
   ```
   
   This is perhaps slightly brittle from CompressedChunkReader's allocated() 
perspective, but the precondition that a given file is not opened **and** read 
by two readers from the same thread concurrently does not seem unreasonable 
(like it is currently in cursor compaction).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to