Will Berkeley has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12538 )

Change subject: KUDU-2704: Rowsets that are much bigger than the target size 
discourage compactions
......................................................................

KUDU-2704: Rowsets that are much bigger than the target size discourage 
compactions

If rowsets are flushed that are much bigger than the target rowset size,
then they may get a negative contribution to their score from the
size-based portion of their valuation in the compaction knapsack
problem. This is a problem for two reasons:

1. It can cause fruitful height-based compactions not to run even though
   the compaction is under budget.
2. In an extreme case, the value of the rowset can become negative,
   which breaks an invariant of the knapsack problem that item weights
   be nonnegative.

This fixes the issue by flooring the size-based contribution at 0. A
regression test is included that is based on the real-world example that
I saw. I also tested that the real-life case I observed was fixed by
this patch.

Why do rowsets get flushed "too big"? It could be because the target
size was changed after they were flushed, but I also see almost all
rowsets flushed with a size that is much too big when the number of
columns becomes large. For example, on the cluster where I discovered
this problem, a table with 279 columns was flushing 85MB rowsets even
though the target size is 32MB. That issue ought to be investigated, but
in the meantime this is a workable fix. It has existed for a long time-
the KUDU-2701 fix just made it apparent because it increased how much
the rowsets exceed the target size in many cases.

Change-Id: I1771cd3dbbb17c87160a4bc38b48b3fbc7307676
Reviewed-on: http://gerrit.cloudera.org:8080/12538
Reviewed-by: Andrew Wong <[email protected]>
Tested-by: Kudu Jenkins
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/rowset_info.cc
2 files changed, 37 insertions(+), 1 deletion(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/12538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1771cd3dbbb17c87160a4bc38b48b3fbc7307676
Gerrit-Change-Number: 12538
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <[email protected]>

Reply via email to