Dan Burkert has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8982 )
Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected ...................................................................... KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is a pair of variable-length integers. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Reviewed-on: http://gerrit.cloudera.org:8080/8982 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert <danburk...@apache.org> --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/multi_column_writer.cc 8 files changed, 73 insertions(+), 36 deletions(-) Approvals: Kudu Jenkins: Verified Dan Burkert: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 7 Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>