Re: [PR] [CASSANDRA-21083][trunk] Optimize memtable flush logic [cassandra]

via GitHub Fri, 09 Jan 2026 07:21:34 -0800


blambov commented on code in PR #4536:
URL: https://github.com/apache/cassandra/pull/4536#discussion_r2676204100



##########
src/java/org/apache/cassandra/io/sstable/format/SortedTableWriter.java:
##########
@@ -197,20 +209,21 @@ private void addStaticRow(DecoratedKey key, Row row) 
throws IOException
         onStaticRow(row);
     }
 
-    private void addUnfiltered(DecoratedKey key, Unfiltered unfiltered) throws 
IOException
+    private void addUnfiltered(DecoratedKey key, Unfiltered unfiltered, 
boolean isRowFirstOrLast) throws IOException
     {
         if (unfiltered.isRow())
-            addRow(key, (Row) unfiltered);
+            addRow(key, (Row) unfiltered, isRowFirstOrLast);
         else
             addRangeTomstoneMarker((RangeTombstoneMarker) unfiltered);

Review Comment:
   This optimization can also apply to range tombstones.



##########
src/java/org/apache/cassandra/db/memtable/Flushing.java:
##########
@@ -245,7 +245,8 @@ public static SSTableMultiWriter 
createFlushWriter(ColumnFamilyStore cfs,
                                             new SerializationHeader(true,
                                                                     
flushSet.metadata(),
                                                                     
flushSet.columns(),
-                                                                    
flushSet.encodingStats()),
+                                                                    
flushSet.encodingStats(),
+                                                                    
flushSet.columnsChangedAfterCreation()),

Review Comment:
   As we are using the flush set's columns, I don't think the metadata's 
definition (and the question whether it's changed) is relevant at all. Is it 
really possible to serialize with a different set than the one the rows are 
already using?



##########
src/java/org/apache/cassandra/db/rows/Cell.java:
##########
@@ -280,11 +280,47 @@ public static class Serializer
         public <T> void serialize(Cell<T> cell, ColumnMetadata column, 
DataOutputPlus out, LivenessInfo rowLiveness, SerializationHeader header) 
throws IOException
         {
             assert cell != null;
-            boolean hasValue = cell.valueSize() > 0;
-            boolean isDeleted = cell.isTombstone();
-            boolean isExpiring = cell.isExpiring();
-            boolean useRowTimestamp = !rowLiveness.isEmpty() && 
cell.timestamp() == rowLiveness.timestamp();
-            boolean useRowTTL = isExpiring && rowLiveness.isExpiring() && 
cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() == 
rowLiveness.localExpirationTime();
+            int valueSize;
+            boolean hasValue;
+            boolean isDeleted;
+            boolean isExpiring;
+            long cellTimestamp;
+            long localDeletionTime;
+            int ttl;
+            T value;
+            ValueAccessor<T> accessor;
+            // to avoid megamorphic calls we split call sites
+            // we have ArrayCell, BufferCell and NativeCell and all of them 
can be here in different scenarios
+            if (cell.getClass() == NativeCell.class)

Review Comment:
   I'm curious if you can you quantify the effect of this?
   
   (If you don't already have a way to do it, the easiest is probably to run 
`ReadSmallPartitionsBench` with flush=YES and get the flush time log line.)



##########
src/java/org/apache/cassandra/db/ClusteringPrefix.java:
##########
@@ -258,6 +258,23 @@ default boolean isEmpty()
      */
     public V get(int i);
 
+
+    default void writeValue(AbstractType<?> type, int i, DataOutputPlus out) 
throws IOException

Review Comment:
   The skipping of empty/null values is quite surprising given that we not only 
want them, but we also make a distinction between the two kinds. Could you 
rename the two methods in a way that states this is being done? (E.g. add 
`SkipNullAndEmpty` to the name)



##########
src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java:
##########
@@ -245,10 +246,27 @@ public void update(LivenessInfo newInfo)
     public void update(Cell<?> cell)
     {
         ++currentPartitionCells;
-        updateTimestamp(cell.timestamp());
-        updateTTL(cell.ttl());
-        updateLocalDeletionTime(cell.localDeletionTime());
-        if (!cell.isLive(nowInSec))
+        long timestamp;
+        int ttl;
+        long localDeletionTime;
+        if (cell.getClass() == ArrayCell.class)

Review Comment:
   Aren't we more interested in `NativeCell.class`? I.e. shouldn't the checks 
here and in `serialize` be consistent?



##########
src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java:
##########
@@ -171,6 +171,12 @@ public UnfilteredRowIterator unfilteredIterator()
         return unfilteredIterator(ColumnFilter.selection(columns()), 
Slices.ALL, false);
     }
 
+    public UnfilteredRowIterator flushingIterator(TableMetadata tableMetadata)
+    {
+        return unfilteredIterator(ColumnFilter.all(columns()), Slices.ALL, 
false);

Review Comment:
   Why do we do `selection` in the method above if `columns()` must contain all 
the iterator has? Can we change the method above so every non-restricted 
`unfilteredIterator` call also benefit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CASSANDRA-21083][trunk] Optimize memtable flush logic [cassandra]

Reply via email to