Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19246 )
Change subject: WIP [compact] Increase chances of compaction for large number deltas ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/19246/1//COMMIT_MSG Commit Message: PS1: It would be great to conduct at least a manual test to see how this change affects the behavior of Kudu tablet server for workloads than generates many deltas. Adding something that's automated is even a better option. Maybe, taking a look at HeavyUpdateCompactionITest could provide some inspiration. http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/delta_tracker.cc@513 PS1, Line 513: undo_redo->EstimateSize(); Should we somehow account for compression here, if present? At least, having factor of 2 (or maybe even have a flag for that) wouldn't hurt if the file is compressed. http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/mock-rowsets.h File src/kudu/tablet/mock-rowsets.h: http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/mock-rowsets.h@162 PS1, Line 162: EstimateBytesInDeltas Does it make sense to override this for MockDiskRowSet() with and add a test to be similar to what we currently have for BudgetedCompactionPolicy::PickRowSets() in compaction_policy.cc? http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/tablet.cc@2469 PS1, Line 2469: double workload_score = 0; Does it make sense to expose a run-time flag to tweak this parameter? >From the other side, we could make it dynamic based on the amount of memory >available before the process hits its hard memory limit. http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/tablet.cc@2482 PS1, Line 2482: workload_score = std::min(FLAGS_workload_score_upper_bound, : tablet_bytes / max_limit_deltas_size); I guess we need to allow this to climb higher at least to get a bit higher than the case of (anchored_mb >= threshold_mb) in FlushOpPerfImprovementPolicy::SetPerfImprovementForFlush(). That's to allow for picking up a very heavy operation over flush operations. I'd think that if the amount of memory needed to run a compaction operation climbs as high as the different between the hard memory limit and current memory usage for kudu server, the perf score should increase quite aggressively, so eventually it gets even higher than the perf score for flush operations. >From the other side, maybe we should normalize the metrics instead (i.e. have >1 as 'normal' maximum) and update the calculations to make sure scores of >memory flush operations don't to stratospherically high? -- To view, visit http://gerrit.cloudera.org:8080/19246 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2199ae3b777e75b15b60d8ad818cc6adc4f5fa3b Gerrit-Change-Number: 19246 Gerrit-PatchSet: 1 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 15 Nov 2022 02:43:57 +0000 Gerrit-HasComments: Yes
