[kudu-CR] WIP [compact] Increase chances of compaction for large number deltas

Alexey Serbin (Code Review) Mon, 14 Nov 2022 18:48:07 -0800

Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19246 )


Change subject: WIP [compact] Increase chances of compaction for large number 
deltas
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19246/1//COMMIT_MSG
Commit Message:

PS1:
It would be great to conduct at least a manual test to see how this change 
affects the behavior of Kudu tablet server for workloads than generates many 
deltas.

Adding something that's automated is even a better option.  Maybe, taking a 
look at HeavyUpdateCompactionITest could provide some inspiration.


http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/delta_tracker.cc@513
PS1, Line 513: undo_redo->EstimateSize();
Should we somehow account for compression here, if present?  At least, having 
factor of 2 (or maybe even have a flag for that) wouldn't hurt if the file is 
compressed.


http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/mock-rowsets.h
File src/kudu/tablet/mock-rowsets.h:

http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/mock-rowsets.h@162
PS1, Line 162: EstimateBytesInDeltas
Does it make sense to override this for MockDiskRowSet() with and add a test to 
be similar to what we currently have for  
BudgetedCompactionPolicy::PickRowSets() in compaction_policy.cc?


http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/tablet.cc@2469
PS1, Line 2469:   double workload_score = 0;
Does it make sense to expose a run-time flag to tweak this parameter?

>From the other side, we could make it dynamic based on the amount of memory 
>available before the process hits its hard memory limit.


http://gerrit.cloudera.org:8080/#/c/19246/1/src/kudu/tablet/tablet.cc@2482
PS1, Line 2482:   workload_score = std::min(FLAGS_workload_score_upper_bound,
              :                             tablet_bytes / 
max_limit_deltas_size);
I guess we need to allow this to climb higher at least to get a bit higher than 
the case of (anchored_mb >= threshold_mb) in 
FlushOpPerfImprovementPolicy::SetPerfImprovementForFlush().  That's to allow 
for picking up a very heavy operation over flush operations.

I'd think that if the amount of memory needed to run a compaction operation 
climbs as high as the different between the hard memory limit and current 
memory usage for kudu server, the perf score should increase quite 
aggressively, so eventually it gets even higher than the perf score for flush 
operations.

>From the other side, maybe we should normalize the metrics instead (i.e. have 
>1 as 'normal' maximum) and update the calculations to make sure scores of 
>memory flush operations don't to stratospherically high?



--
To view, visit http://gerrit.cloudera.org:8080/19246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2199ae3b777e75b15b60d8ad818cc6adc4f5fa3b
Gerrit-Change-Number: 19246
Gerrit-PatchSet: 1
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 15 Nov 2022 02:43:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [compact] Increase chances of compaction for large number deltas

Reply via email to