Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4153
to look at the new patch set (#3).
Change subject: KUDU-1582. Optimize budgeted compaction policy with an
approximation
......................................................................
KUDU-1582. Optimize budgeted compaction policy with an approximation
On a server with ~5.5TB of data, I noticed that the maintenance manager
was spending tens of seconds computing optimal compactions, and most of
the time was in knapsack solving.
This patch implements a "first pass" solution to the compaction policy
using an approximation algorithm that is significantly faster than the
actual knapsack solver. The approximation algorithm gives a lower bound
solution which is at least 50% as good as the optimal solution (i.e. it
is a '2-approximation'). In practice, it is often optimal or nearly
optimal.
To test, I dumped the bounds and sizes of rowsets from a 200+GB YCSB
tablet from a real cluster that has been inserting for a few days, and
wrote a test which imported that layout back in as mock rowsets in order
to run the policy.
The result (with the default 5% max approximation error) is between a 6x
and 20x speedup (depending on the budget size). The resulting loss in
solution quality is less than 1%.
Before:
Time spent Computing compaction with 128MB budget: real 0.965s user
0.964s sys 0.000s
quality=0.118006
Time spent Computing compaction with 256MB budget: real 1.202s user
1.204s sys 0.000s
quality=0.254289
Time spent Computing compaction with 512MB budget: real 2.233s user
2.232s sys 0.000s
quality=0.524391
Time spent Computing compaction with 1024MB budget: real 4.202s user
4.200s sys 0.000s
quality=1.06674
After:
Time spent Computing compaction with 128MB budget: real 0.148s user
0.148s sys 0.000s
quality=0.117985
Time spent Computing compaction with 256MB budget: real 0.162s user
0.160s sys 0.000s
quality=0.252118
Time spent Computing compaction with 512MB budget: real 0.174s user
0.176s sys 0.000s
quality=0.524391
Time spent Computing compaction with 1024MB budget: real 0.206s user
0.208s sys 0.000s
quality=1.06674
Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958
---
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/compaction_policy.h
A src/kudu/tablet/ycsb-test-rowsets.tsv
5 files changed, 7,154 insertions(+), 116 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/4153/3
--
To view, visit http://gerrit.cloudera.org:8080/4153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Anonymous Coward #174
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>