Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1582. Optimize budgeted compaction policy with an 
approximation
......................................................................


KUDU-1582. Optimize budgeted compaction policy with an approximation

On a server with ~5.5TB of data, I noticed that the maintenance manager
was spending tens of seconds computing optimal compactions, and most of
the time was in knapsack solving.

This patch implements a "first pass" solution to the compaction policy
using an approximation algorithm that is significantly faster than the
actual knapsack solver. The approximation algorithm gives a lower bound
solution which is at least 50% as good as the optimal solution (i.e. it
is a '2-approximation'). In practice, it is often optimal or nearly
optimal.

To test, I dumped the bounds and sizes of rowsets from a 200+GB YCSB
tablet from a real cluster that has been inserting for a few days, and
wrote a test which imported that layout back in as mock rowsets in order
to run the policy.

The result (with the default 5% max approximation error) is between a 6x
and 20x speedup (depending on the budget size). The resulting loss in
solution quality is less than 1%.

Before:
  Time spent Computing compaction with 128MB budget: real 0.965s        user 
0.964s     sys 0.000s
   quality=0.118006
  Time spent Computing compaction with 256MB budget: real 1.202s        user 
1.204s     sys 0.000s
   quality=0.254289
  Time spent Computing compaction with 512MB budget: real 2.233s        user 
2.232s     sys 0.000s
   quality=0.524391
  Time spent Computing compaction with 1024MB budget: real 4.202s       user 
4.200s     sys 0.000s
   quality=1.06674

After:
  Time spent Computing compaction with 128MB budget: real 0.148s        user 
0.148s     sys 0.000s
   quality=0.117985
  Time spent Computing compaction with 256MB budget: real 0.162s        user 
0.160s     sys 0.000s
   quality=0.252118
  Time spent Computing compaction with 512MB budget: real 0.174s        user 
0.176s     sys 0.000s
   quality=0.524391
  Time spent Computing compaction with 1024MB budget: real 0.206s       user 
0.208s     sys 0.000s
   quality=1.06674

Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958
Reviewed-on: http://gerrit.cloudera.org:8080/4153
Reviewed-by: David Ribeiro Alves <dral...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/compaction_policy.h
A src/kudu/tablet/ycsb-test-rowsets.tsv
5 files changed, 7,161 insertions(+), 116 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Anonymous Coward #174
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to