Re: [PR] [CASSANDRA-21083][trunk] Optimize memtable flush logic [cassandra]

via GitHub Fri, 09 Jan 2026 08:47:23 -0800


netudima commented on code in PR #4536:
URL: https://github.com/apache/cassandra/pull/4536#discussion_r2676885841



##########
src/java/org/apache/cassandra/db/rows/Cell.java:
##########
@@ -280,11 +280,47 @@ public static class Serializer
         public <T> void serialize(Cell<T> cell, ColumnMetadata column, 
DataOutputPlus out, LivenessInfo rowLiveness, SerializationHeader header) 
throws IOException
         {
             assert cell != null;
-            boolean hasValue = cell.valueSize() > 0;
-            boolean isDeleted = cell.isTombstone();
-            boolean isExpiring = cell.isExpiring();
-            boolean useRowTimestamp = !rowLiveness.isEmpty() && 
cell.timestamp() == rowLiveness.timestamp();
-            boolean useRowTTL = isExpiring && rowLiveness.isExpiring() && 
cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() == 
rowLiveness.localExpirationTime();
+            int valueSize;
+            boolean hasValue;
+            boolean isDeleted;
+            boolean isExpiring;
+            long cellTimestamp;
+            long localDeletionTime;
+            int ttl;
+            T value;
+            ValueAccessor<T> accessor;
+            // to avoid megamorphic calls we split call sites
+            // we have ArrayCell, BufferCell and NativeCell and all of them 
can be here in different scenarios
+            if (cell.getClass() == NativeCell.class)

Review Comment:
   yes, I have interim results and profiles captured for each commit in his PR.
   
   before this commit:
   ```
   INFO  [PerDiskMemtableFlushWriter_0:20] 2025-12-18T13:29:20,419 
Flushing.java:196 - Completed flushing 
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-46-big-Data.db
 (226.600MiB) for commitlog position CommitLogPosition(segmentId=1766064398832, 
position=234105), time spent: 5540 ms, bytes flushed: 237607365/(47521473 per 
sec), partitions flushed: 258735/(51747 per sec), rows: 177129833/(35425966 per 
sec), cpu time: 3930 ms, allocated: 764.749MiB
   ```
   after this commit:
   
   ```
   INFO  [PerDiskMemtableFlushWriter_0:11] 2025-12-18T14:32:46,404 
Flushing.java:196 - Completed flushing 
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-36-big-Data.db
 (227.192MiB) for commitlog position CommitLogPosition(segmentId=1766068252348, 
position=233003), time spent: 4959 ms, bytes flushed: 238228477/(59557119 per 
sec), partitions flushed: 259283/(64820 per sec), rows: 177545115/(44386278 per 
sec), cpu time: 3485 ms, allocated: 766.370MiB
   ```
   so, the delta for cpu time is quite visible: 3930 vs 3485 
   
   
[flush_2_cell_stats.log](https://github.com/user-attachments/files/24531447/flush_2_cell_stats.log)
   
[flush_1_stat_stats.log](https://github.com/user-attachments/files/24531448/flush_1_stat_stats.log)
   
   
[flush_2_cell_cpu.html](https://github.com/user-attachments/files/24531305/flush_2_cell_cpu.html)
   
[flush_1_stat_cpu.html](https://github.com/user-attachments/files/24531306/flush_1_stat_cpu.html)
   
   In my test I have 5 value text columns. So, for a write rate 2'000'000 rows 
per second (which I am close to now), we have 10 mln cells, 6 calls per cell 
object in the serialization logic-> 60 mln calls per second for flushing only 
only, on this level even a small overhead matters.
   
   This change also slightly improve serialization for Mutation, so the overall 
win is about 1-1.1% of CPU in total, based on async profiler CPU profile.
   
   Obviously, for tables with just a single value column the result will be not 
that impressive but in my experience tables with 10 or even more than 100 cells 
(UDTs/collections) per row are not rare.
   
   Notes:
   1) I prefer e2e test for this kind of overheads validation due to a JIT 
profile pollution: when we do not have serialization or/and compaction executed 
within the same JVM run we may get too optimistic results because the related 
logic was not invoked for different cell classes in a microbenchmark and JVM 
can do monomorphic inlining. The cell seriialization logic is invoked for 
commit log/server-2-server, for flushing, for reads and for compaction, so we 
may have all 3 Cell types used in a worst case within the same call site and we 
get metamorphic calls (async profile shows then as `vtable stub` 
https://github.com/async-profiler/async-profiler/blob/master/docs/AdvancedStacktraceFeatures.md#display-actual-implementation-in-vtable)
   2) in this logic we invoke cell methods several times per cell, so the cost 
of the type check I introduced is amortised 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CASSANDRA-21083][trunk] Optimize memtable flush logic [cassandra]

Reply via email to