Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4153

to look at the new patch set (#3).

Change subject: KUDU-1582. Optimize budgeted compaction policy with an 
approximation
......................................................................

KUDU-1582. Optimize budgeted compaction policy with an approximation

On a server with ~5.5TB of data, I noticed that the maintenance manager
was spending tens of seconds computing optimal compactions, and most of
the time was in knapsack solving.

This patch implements a "first pass" solution to the compaction policy
using an approximation algorithm that is significantly faster than the
actual knapsack solver. The approximation algorithm gives a lower bound
solution which is at least 50% as good as the optimal solution (i.e. it
is a '2-approximation'). In practice, it is often optimal or nearly
optimal.

To test, I dumped the bounds and sizes of rowsets from a 200+GB YCSB
tablet from a real cluster that has been inserting for a few days, and
wrote a test which imported that layout back in as mock rowsets in order
to run the policy.

The result (with the default 5% max approximation error) is between a 6x
and 20x speedup (depending on the budget size). The resulting loss in
solution quality is less than 1%.

Before:
  Time spent Computing compaction with 128MB budget: real 0.965s        user 
0.964s     sys 0.000s
   quality=0.118006
  Time spent Computing compaction with 256MB budget: real 1.202s        user 
1.204s     sys 0.000s
   quality=0.254289
  Time spent Computing compaction with 512MB budget: real 2.233s        user 
2.232s     sys 0.000s
   quality=0.524391
  Time spent Computing compaction with 1024MB budget: real 4.202s       user 
4.200s     sys 0.000s
   quality=1.06674

After:
  Time spent Computing compaction with 128MB budget: real 0.148s        user 
0.148s     sys 0.000s
   quality=0.117985
  Time spent Computing compaction with 256MB budget: real 0.162s        user 
0.160s     sys 0.000s
   quality=0.252118
  Time spent Computing compaction with 512MB budget: real 0.174s        user 
0.176s     sys 0.000s
   quality=0.524391
  Time spent Computing compaction with 1024MB budget: real 0.206s       user 
0.208s     sys 0.000s
   quality=1.06674

Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958
---
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/compaction_policy.h
A src/kudu/tablet/ycsb-test-rowsets.tsv
5 files changed, 7,154 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/4153/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Anonymous Coward #174
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to