Hello Will Berkeley, Grant Henke, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8982

to review the following change.


Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected
......................................................................

KUDU-2253 Deltafile on-disk size is 3x larger than expected

While looking into the performance of the integration test written for
KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I
found that the on-disk deltafiles written are about 3x larger than
expected. The culprit is an optimization in the CFile value index which
is turned off for delta files. The optimization truncates large keys
after the first unique byte between sequential values. The deltafile
values, in the case of this integration test, include the small
DeltaKey, and the 8KiB updated value. As a result the BTree interior
nodes are being completely filled by only ~4 values (32KiB cblock size
by default). This makes the BTree far less effective, and means that the
full updated data is written many times. We expect fixing this will
improve performance for update-heavy workloads with large values (for
example, YCSB).

Enabling the optimization changes the on-disk format of delta files, so
we have to proceed in steps. This commit enables deltafile reader
compatibility with the optimization, but doesn't yet default to using it
while writing delta files. A new experimental flag,
deltafile_optimize_index_keys controls whether to write deltafiles
with the optimization. We should change the default to true after a
waiting a minimum of one release, in order to allow downgrading Kudu one
minor release.

Testing: I've added basic forwards/backwards compatibility tests. I plan
to add a more intensive test of the optimization as part of the
integration test in KUDU-2251.

Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c
---
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
5 files changed, 63 insertions(+), 60 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/1
--
To view, visit http://gerrit.cloudera.org:8080/8982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c
Gerrit-Change-Number: 8982
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@gmail.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>

Reply via email to