Hello Kudu Jenkins, Andrew Wong, Adar Dembo,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12488

to look at the new patch set (#2).

Change subject: KUDU-2701 Fix compaction loop due to using wrong rowset size
......................................................................

KUDU-2701 Fix compaction loop due to using wrong rowset size

Compaction policy originally evaluated the size of a rowset to be the
size of the rowset's base data and its REDOs. This size is used to
calculate the probability mass of the rowset and the weight of the
rowset in the compaction knapsack problem. Mistakenly, it was also used
as the size of a rowset for KUDU-1400 small rowset compaction policy.
This is wrong- the size of the whole rowset should be used. The reason
for this is the following: small rowset compaction compacts
rowsets when the number of rowsets is reduced. The number of rowsets
produced depends on the total size of the data when written. If a
partial size of the rowset is used, this can lead to bad decisions being
made about compaction. For example, I discovered a case where a tablet
had 8 rowsets of "size" 16MB. In reality, this size was the base data
size plus REDOs. Small rowset compaction policy determined that it would
be good to compact these 8 rowsets, believing that it would produce 4
rowsets of size 32MB. However, the rowsets were actually 32MB in size,
and compacting them produced 8 rowsets of size 32MB identical to the
previous 8, and therefore 8 rowsets that appeared to be of size 16MB to
compaction policy. Thus these 8 were chosen to be compacted, and so
on...

This patch changes the small rowset compaction policy code to use the
full size of the rowsets. All other uses of size remain the same. It's
not totally clear to me why just base data and REDOs are used, but in
any case changing it would change CDFs and change knapsack calculations,
which could lead to pretty big differences in compaction behavior.
Since, outside this bug, compaction is working fine, and since this
patch is targeted to unblock 1.9, I opted to make a minimal change to
fix the bug and not evaluate any larger chance to compaction policy.

Change-Id: I21b7ff6333137aaf1e98ef4849691dd08e24e007
---
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
3 files changed, 28 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/12488/2
--
To view, visit http://gerrit.cloudera.org:8080/12488
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21b7ff6333137aaf1e98ef4849691dd08e24e007
Gerrit-Change-Number: 12488
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to