samueldlightfoot commented on code in PR #4606:
URL: https://github.com/apache/cassandra/pull/4606#discussion_r2816736324
##########
src/java/org/apache/cassandra/db/compaction/CursorCompactor.java:
##########
@@ -1553,6 +1547,33 @@ private static String mergeHistogramToString(long[]
histogram)
return sb.toString();
}
+ /**
+ * Closes scanner-opened readers before opening cursor-specific readers
with the configured disk access mode.
+ * In cursor compaction, scanners are only used for metadata; closing them
avoids holding redundant file
+ * descriptors and prevents conflicts when scan and non-scan readers for
the same file share thread-local
+ * buffer state on the same thread.
+ */
+ private static StatefulCursor[]
convertScannersToCursors(List<ISSTableScanner> scanners,
ImmutableSet<SSTableReader> sstables,
Review Comment:
Agree there's a smell - the cursor compaction code has been tailored to fit
the existing compaction interface and accept scanners and pull just metadata
from them, whereas raw SSTables would have suited better. The ideal refactor I
see is to let each separate compaction pipeline open the readers it requires,
but this may be better done in a follow-up.
The disk access mode is not actually the issue here, it is the fact two
readers (one **scan** and one **non-scan**) exist for an SSTable at the same
time on the same thread. On trunk this is a bug causing cursor compaction to
use scan reads (with a read-ahead buffer) rather than random access reads
(intended), due to ScanCompressedReader's allocated() check being based on
shared static state, looking like the below
```
@Override
public boolean allocated()
{
// this checks the static block map, which is inherently shared
between the two
// readers (scan + non-scan)
return readAheadBuffer.hasBuffer();
}
```
Closing the scanner before opening the cursor reader deallocates the block
map and thus allocated() returns false when opening the cursor readers, leading
to the correct random access reader 'reader' being chosen:
```
// How readChunk picks the reader (scan vs random)
CompressedReader readFrom = (scanReader != null && scanReader.allocated()) ?
scanReader : reader;
```
This is perhaps slightly brittle from CompressedChunkReader's allocated()
perspective, but the precondition that a given file is not opened by two
readers from the same thread concurrently does not seem unreasonable (like it
is currently in cursor compaction). I did look into guarding instantiation for
the same file but it caused a large number of test failures (I cannot remember
the full details).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]