Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23348 )

Change subject: [compaction] include undo size while picking rowsets
......................................................................


Patch Set 3:

(5 comments)

Apologies for being late on this. I thought I had responded already. Either, I 
missed to hit send button or responses didn't get saved.

http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG@10
PS3, Line 10: taking into consideration the size of undo deltas while picking 
rowsets
            : during rowset compaction. I could not find any historical reason 
to why
            : it was not done before. Maybe there was some analysis done that 
ended
            : in a conclusion that considering undo deltas may not be right 
approach
            : when estimating upper bound fractional solution in the knapsack.
> I don't think that was something to do with the nature of the fractional kn
This algorithm is only applicable to rowset merge compaction. I don't see this 
having any role in major or minor delta compaction ops.

Even if there was any partial application or use somewhere in major or minor 
delta compaction, I would expect it would be different for both, such as 
inclusion of just the redo deltas for minor delta compaction and inclusion of 
both base data as well redo deltas for major delta compaction.

For major delta compaction op, the scheduling decision based calculation makes 
use of redo_deltas_size and base_data_size directly without considering undo 
deltas at all.
Similarly, for minor delta compaction op, delta store count is taken into 
consideration instead of redo/undo delta size or base data size.

I hope this clarifies the confusion.


http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG@38
PS3, Line 38:   [ ] RowSet(217)(  1M) [0.0027, 0.1953]
> I think this is not exactly related to the essence of the patch, but do you
Basically, what is happening here is that rowset 217 is ranging from 0.0027 to 
0.1953 which is increasing the overall width (by almost 100 times) of the 
resultant rowset with possibility of increased overlaps in future. From 
compaction policy goal, this is not desirable and highly inefficient. If rowset 
217 is not part of selected rowsets, the resultant rowset would have high 
number of overlaps (i.e. 216, 218 and 219) for small width/range which is 
highly efficient.

We don't want to end up with unnecessarily wide rowsets with average height 1, 
instead a number of non-overlapping rowsets with average height 1 is better.

For below case (2048 MB) where budget is beyond the total size of deltas, 
rowset 211 can be picked even. That changes the equation a little bit. Now, the 
overlap between 211 and 217 makes it an attractive candidate. Even though 
resultant rowset will be wide, the average rowset height (consulted during 
future ops) would remain minimum. Not to mention, 211 overlaps with all the 
other rowsets as well.


http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:

http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@92
PS3, Line 92: redos_and_undos
> nit for  here and elsewhere: similar to method names in other interfaces, c
Done


http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@142
PS3, Line 142: base_redos_and_undos_size_mb_
> nit: base_and_deltas_size_mb_
Done


http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@166
PS3, Line 166: base_redos_and_undos_size_bytes
> nit: ditto
Done



--
To view, visit http://gerrit.cloudera.org:8080/23348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592
Gerrit-Change-Number: 23348
Gerrit-PatchSet: 3
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Comment-Date: Thu, 22 Jan 2026 17:52:12 +0000
Gerrit-HasComments: Yes

Reply via email to