Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12539


Change subject: KUDU-2704: Rowsets that are much bigger than the target size 
discourage compactions
......................................................................

KUDU-2704: Rowsets that are much bigger than the target size discourage 
compactions

If rowsets are flushed that are much bigger than the target rowset size,
then they may get a negative contribution to their score from the
size-based portion of their valuation in the compaction knapsack
problem. This is a problem for two reasons:

1. It can cause fruitful height-based compactions not to run even though
   the compaction is under budget.
2. In an extreme case, the value of the rowset can become negative,
   which breaks an invariant of the knapsack problem that item weights
   be nonnegative.

This fixes the issue by flooring the size-based contribution at 0. A
regression test is included that is based on the real-world example that
I saw. I also tested that the real-life case I observed was fixed by
this patch.

Why do rowsets get flushed "too big"? It could be because the target
size was changed after they were flushed, but I also see almost all
rowsets flushed with a size that is much too big when the number of
columns becomes large. For example, on the cluster where I discovered
this problem, a table with 279 columns was flushing 85MB rowsets even
though the target size is 32MB. That issue ought to be investigated, but
in the meantime this is a workable fix. It has existed for a long time-
the KUDU-2701 fix just made it apparent because it increased how much
the rowsets exceed the target size in many cases.

Change-Id: I1771cd3dbbb17c87160a4bc38b48b3fbc7307676
Reviewed-on: http://gerrit.cloudera.org:8080/12538
Reviewed-by: Andrew Wong <[email protected]>
Tested-by: Kudu Jenkins
(cherry picked from commit fad69bb5104ebb8cf335f345475ffb02cec71329)
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/rowset_info.cc
2 files changed, 37 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/12539/1
--
To view, visit http://gerrit.cloudera.org:8080/12539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.9.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1771cd3dbbb17c87160a4bc38b48b3fbc7307676
Gerrit-Change-Number: 12539
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <[email protected]>

Reply via email to