Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23348 )

Change subject: [compaction] include undo size while picking rowsets
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG@10
PS3, Line 10: taking into consideration the size of undo deltas while picking 
rowsets
            : during rowset compaction. I could not find any historical reason 
to why
            : it was not done before. Maybe there was some analysis done that 
ended
            : in a conclusion that considering undo deltas may not be right 
approach
            : when estimating upper bound fractional solution in the knapsack.
I don't think that was something to do with the nature of the fractional 
knapsack algorithm and the boundaries for a solution.

IIUC, the same algorithm is used to get the most bang on the same budget among 
all possible maintenance ops on the existing rowsets.  In other words, this is 
used not only for rowset merge compaction, but also for minor and major delta 
compaction.

In this context, it's quite clear why to include only redo deltas, and not undo 
deltas.  This is related to heuristics on the amount of IO necessary to build 
the current value given the base data and the redo deltas to apply. The most 
frequement requests are for the very latest data, not some data in the past, 
and a part of that information (i.e. whether a row for given key is present) is 
also necessary when processing DML operations like INSERT/UPDATE/DELETE/UPSERT 
and their xxx_IGNORE flavors.  IIUC, that's the reason why it makes sense to 
use only the size of redo deltas when trying to find the 'worst' rowsets to 
perform minor/major delta compaction on them.

Does this make sense?

If yes, then I'm curious what are your thoughts on the IO-related heuristics in 
the context of selecting the 'worst' rowset for minor/major delta compaction, 
and whether those heuristics are now skewed once introducing a long tail of 
redo deltas with the introduction of this update (IIRC, by default the system 
keeps 7 days of delta history).

Thanks!


http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG@38
PS3, Line 38:   [ ] RowSet(217)(  1M) [0.0027, 0.1953]
I think this is not exactly related to the essence of the patch, but do you 
have any reasonable explanation why RowSet 217 wasn't picked up with 1024MB 
budget, but was picked up when the budget was set to 2048MB?


http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:

http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@92
PS3, Line 92: redos_and_undos
nit for  here and elsewhere: similar to method names in other interfaces, 
consider changing this 'redos_and_undos' --> 'and_deltas'


http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@142
PS3, Line 142: base_redos_and_undos_size_mb_
nit: base_and_deltas_size_mb_


http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@166
PS3, Line 166: base_redos_and_undos_size_bytes
nit: ditto



--
To view, visit http://gerrit.cloudera.org:8080/23348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592
Gerrit-Change-Number: 23348
Gerrit-PatchSet: 3
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Comment-Date: Wed, 14 Jan 2026 04:24:28 +0000
Gerrit-HasComments: Yes

Reply via email to