Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12539
Change subject: KUDU-2704: Rowsets that are much bigger than the target size discourage compactions ...................................................................... KUDU-2704: Rowsets that are much bigger than the target size discourage compactions If rowsets are flushed that are much bigger than the target rowset size, then they may get a negative contribution to their score from the size-based portion of their valuation in the compaction knapsack problem. This is a problem for two reasons: 1. It can cause fruitful height-based compactions not to run even though the compaction is under budget. 2. In an extreme case, the value of the rowset can become negative, which breaks an invariant of the knapsack problem that item weights be nonnegative. This fixes the issue by flooring the size-based contribution at 0. A regression test is included that is based on the real-world example that I saw. I also tested that the real-life case I observed was fixed by this patch. Why do rowsets get flushed "too big"? It could be because the target size was changed after they were flushed, but I also see almost all rowsets flushed with a size that is much too big when the number of columns becomes large. For example, on the cluster where I discovered this problem, a table with 279 columns was flushing 85MB rowsets even though the target size is 32MB. That issue ought to be investigated, but in the meantime this is a workable fix. It has existed for a long time- the KUDU-2701 fix just made it apparent because it increased how much the rowsets exceed the target size in many cases. Change-Id: I1771cd3dbbb17c87160a4bc38b48b3fbc7307676 Reviewed-on: http://gerrit.cloudera.org:8080/12538 Reviewed-by: Andrew Wong <[email protected]> Tested-by: Kudu Jenkins (cherry picked from commit fad69bb5104ebb8cf335f345475ffb02cec71329) --- M src/kudu/tablet/compaction_policy-test.cc M src/kudu/tablet/rowset_info.cc 2 files changed, 37 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/12539/1 -- To view, visit http://gerrit.cloudera.org:8080/12539 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.9.x Gerrit-MessageType: newchange Gerrit-Change-Id: I1771cd3dbbb17c87160a4bc38b48b3fbc7307676 Gerrit-Change-Number: 12539 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <[email protected]>
