netudima commented on code in PR #4536:
URL: https://github.com/apache/cassandra/pull/4536#discussion_r2676885841
##########
src/java/org/apache/cassandra/db/rows/Cell.java:
##########
@@ -280,11 +280,47 @@ public static class Serializer
public <T> void serialize(Cell<T> cell, ColumnMetadata column,
DataOutputPlus out, LivenessInfo rowLiveness, SerializationHeader header)
throws IOException
{
assert cell != null;
- boolean hasValue = cell.valueSize() > 0;
- boolean isDeleted = cell.isTombstone();
- boolean isExpiring = cell.isExpiring();
- boolean useRowTimestamp = !rowLiveness.isEmpty() &&
cell.timestamp() == rowLiveness.timestamp();
- boolean useRowTTL = isExpiring && rowLiveness.isExpiring() &&
cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() ==
rowLiveness.localExpirationTime();
+ int valueSize;
+ boolean hasValue;
+ boolean isDeleted;
+ boolean isExpiring;
+ long cellTimestamp;
+ long localDeletionTime;
+ int ttl;
+ T value;
+ ValueAccessor<T> accessor;
+ // to avoid megamorphic calls we split call sites
+ // we have ArrayCell, BufferCell and NativeCell and all of them
can be here in different scenarios
+ if (cell.getClass() == NativeCell.class)
Review Comment:
yes, I have interim results and profiles captured for each commit in his PR.
before this commit:
```
INFO [PerDiskMemtableFlushWriter_0:20] 2025-12-18T13:29:20,419
Flushing.java:196 - Completed flushing
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-46-big-Data.db
(226.600MiB) for commitlog position CommitLogPosition(segmentId=1766064398832,
position=234105), time spent: 5540 ms, bytes flushed: 237607365/(47521473 per
sec), partitions flushed: 258735/(51747 per sec), rows: 177129833/(35425966 per
sec), cpu time: 3930 ms, allocated: 764.749MiB
```
after this commit:
```
INFO [PerDiskMemtableFlushWriter_0:11] 2025-12-18T14:32:46,404
Flushing.java:196 - Completed flushing
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-36-big-Data.db
(227.192MiB) for commitlog position CommitLogPosition(segmentId=1766068252348,
position=233003), time spent: 4959 ms, bytes flushed: 238228477/(59557119 per
sec), partitions flushed: 259283/(64820 per sec), rows: 177545115/(44386278 per
sec), cpu time: 3485 ms, allocated: 766.370MiB
```
so, the delta for cpu time is quite visible: 3930 vs 3485
[flush_2_cell_stats.log](https://github.com/user-attachments/files/24531447/flush_2_cell_stats.log)
[flush_1_stat_stats.log](https://github.com/user-attachments/files/24531448/flush_1_stat_stats.log)
[flush_2_cell_cpu.html](https://github.com/user-attachments/files/24531305/flush_2_cell_cpu.html)
[flush_1_stat_cpu.html](https://github.com/user-attachments/files/24531306/flush_1_stat_cpu.html)
In my test I have 5 value text columns. So, for a write rate 2'000'000 rows
per second (which I am close to now), we have 10 mln cells, 6 calls per cell
object in the serialization logic-> 60 mln calls per second for flushing only
only, on this level even a small overhead matters.
This change also slightly improve serialization for Mutation, so the overall
win is about 1-1.1% of CPU in total, based on async profiler CPU profile.
Obviously, for tables with just a single value column the result will be not
that impressive but in my experience tables with 10 or even more than 100 cells
(UDTs/collections) per row are not rare.
Notes:
1) I prefer e2e test for this kind of overheads validation due to a JIT
profile pollution: when we do not have serialization or/and compaction executed
within the same JVM run we may get too optimistic results because the related
logic was not invoked for different cell classes in a microbenchmark and JVM
can do monomorphic inlining. The cell seriialization logic is invoked for
commit log/server-2-server, for flushing, for reads and for compaction, so we
may have all 3 Cell types used in a worst case within the same call site and we
get metamorphic calls (async profile shows then as `vtable stub`
https://github.com/async-profiler/async-profiler/blob/master/docs/AdvancedStacktraceFeatures.md#display-actual-implementation-in-vtable)
2) in this logic we invoke cell methods several times per cell, so the cost
of the type check I introduced is amortised
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]