Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/23348 )
Change subject: [compaction] include undo size while picking rowsets ...................................................................... Patch Set 3: (5 comments) Apologies for being late on this. I thought I had responded already. Either, I missed to hit send button or responses didn't get saved. http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG@10 PS3, Line 10: taking into consideration the size of undo deltas while picking rowsets : during rowset compaction. I could not find any historical reason to why : it was not done before. Maybe there was some analysis done that ended : in a conclusion that considering undo deltas may not be right approach : when estimating upper bound fractional solution in the knapsack. > I don't think that was something to do with the nature of the fractional kn This algorithm is only applicable to rowset merge compaction. I don't see this having any role in major or minor delta compaction ops. Even if there was any partial application or use somewhere in major or minor delta compaction, I would expect it would be different for both, such as inclusion of just the redo deltas for minor delta compaction and inclusion of both base data as well redo deltas for major delta compaction. For major delta compaction op, the scheduling decision based calculation makes use of redo_deltas_size and base_data_size directly without considering undo deltas at all. Similarly, for minor delta compaction op, delta store count is taken into consideration instead of redo/undo delta size or base data size. I hope this clarifies the confusion. http://gerrit.cloudera.org:8080/#/c/23348/3//COMMIT_MSG@38 PS3, Line 38: [ ] RowSet(217)( 1M) [0.0027, 0.1953] > I think this is not exactly related to the essence of the patch, but do you Basically, what is happening here is that rowset 217 is ranging from 0.0027 to 0.1953 which is increasing the overall width (by almost 100 times) of the resultant rowset with possibility of increased overlaps in future. From compaction policy goal, this is not desirable and highly inefficient. If rowset 217 is not part of selected rowsets, the resultant rowset would have high number of overlaps (i.e. 216, 218 and 219) for small width/range which is highly efficient. We don't want to end up with unnecessarily wide rowsets with average height 1, instead a number of non-overlapping rowsets with average height 1 is better. For below case (2048 MB) where budget is beyond the total size of deltas, rowset 211 can be picked even. That changes the equation a little bit. Now, the overlap between 211 and 217 makes it an attractive candidate. Even though resultant rowset will be wide, the average rowset height (consulted during future ops) would remain minimum. Not to mention, 211 overlaps with all the other rowsets as well. http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h File src/kudu/tablet/rowset_info.h: http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@92 PS3, Line 92: redos_and_undos > nit for here and elsewhere: similar to method names in other interfaces, c Done http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@142 PS3, Line 142: base_redos_and_undos_size_mb_ > nit: base_and_deltas_size_mb_ Done http://gerrit.cloudera.org:8080/#/c/23348/3/src/kudu/tablet/rowset_info.h@166 PS3, Line 166: base_redos_and_undos_size_bytes > nit: ditto Done -- To view, visit http://gerrit.cloudera.org:8080/23348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I351c0ba96a02e6ded5153adf9d981083a8c40592 Gerrit-Change-Number: 23348 Gerrit-PatchSet: 3 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Comment-Date: Thu, 22 Jan 2026 17:52:12 +0000 Gerrit-HasComments: Yes
