Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation ......................................................................
KUDU-1582. Optimize budgeted compaction policy with an approximation On a server with ~5.5TB of data, I noticed that the maintenance manager was spending tens of seconds computing optimal compactions, and most of the time was in knapsack solving. This patch implements a "first pass" solution to the compaction policy using an approximation algorithm that is significantly faster than the actual knapsack solver. The approximation algorithm gives a lower bound solution which is at least 50% as good as the optimal solution (i.e. it is a '2-approximation'). In practice, it is often optimal or nearly optimal. To test, I dumped the bounds and sizes of rowsets from a 200+GB YCSB tablet from a real cluster that has been inserting for a few days, and wrote a test which imported that layout back in as mock rowsets in order to run the policy. The result (with the default 5% max approximation error) is between a 6x and 20x speedup (depending on the budget size). The resulting loss in solution quality is less than 1%. Before: Time spent Computing compaction with 128MB budget: real 0.965s user 0.964s sys 0.000s quality=0.118006 Time spent Computing compaction with 256MB budget: real 1.202s user 1.204s sys 0.000s quality=0.254289 Time spent Computing compaction with 512MB budget: real 2.233s user 2.232s sys 0.000s quality=0.524391 Time spent Computing compaction with 1024MB budget: real 4.202s user 4.200s sys 0.000s quality=1.06674 After: Time spent Computing compaction with 128MB budget: real 0.148s user 0.148s sys 0.000s quality=0.117985 Time spent Computing compaction with 256MB budget: real 0.162s user 0.160s sys 0.000s quality=0.252118 Time spent Computing compaction with 512MB budget: real 0.174s user 0.176s sys 0.000s quality=0.524391 Time spent Computing compaction with 1024MB budget: real 0.206s user 0.208s sys 0.000s quality=1.06674 Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Reviewed-on: http://gerrit.cloudera.org:8080/4153 Reviewed-by: David Ribeiro Alves <dral...@apache.org> Tested-by: Kudu Jenkins --- M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction_policy-test.cc M src/kudu/tablet/compaction_policy.cc M src/kudu/tablet/compaction_policy.h A src/kudu/tablet/ycsb-test-rowsets.tsv 5 files changed, 7,161 insertions(+), 116 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org>