Hello Will Berkeley, Grant Henke, Todd Lipcon, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/8982 to review the following change. Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected ...................................................................... KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Enabling the optimization changes the on-disk format of delta files, so we have to proceed in steps. This commit enables deltafile reader compatibility with the optimization, but doesn't yet default to using it while writing delta files. A new experimental flag, deltafile_optimize_index_keys controls whether to write deltafiles with the optimization. We should change the default to true after a waiting a minimum of one release, in order to allow downgrading Kudu one minor release. Testing: I've added basic forwards/backwards compatibility tests. I plan to add a more intensive test of the optimization as part of the integration test in KUDU-2251. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h 5 files changed, 63 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/1 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@gmail.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>