[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 6: Code-Review+2 Carrying over Todd's +2 (just an IWYU change) -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 6 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Sat, 13 Jan 2018 00:59:59 + Gerrit-HasComments: No
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Dan Burkert has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is a pair of variable-length integers. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Reviewed-on: http://gerrit.cloudera.org:8080/8982 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert--- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/multi_column_writer.cc 8 files changed, 73 insertions(+), 36 deletions(-) Approvals: Kudu Jenkins: Verified Dan Burkert: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 7 Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8982 to look at the new patch set (#6). Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is a pair of variable-length integers. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/multi_column_writer.cc 8 files changed, 73 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/6 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 6 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 5 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 12 Jan 2018 22:11:14 + Gerrit-HasComments: No
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8982 to look at the new patch set (#5). Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is a pair of variable-length integers. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/multi_column_writer.cc 8 files changed, 72 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/5 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 5 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 4 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 12 Jan 2018 21:33:30 + Gerrit-HasComments: No
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8982 to look at the new patch set (#4). Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is a pair of variable-length integers. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/multi_column_writer.cc 8 files changed, 71 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/4 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 4 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc@107 PS2, Line 107: opts.optimize_index_keys = false; > did you consider putting the function in WriterOptions instead of a new par Done http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc@109 PS2, Line 109: cfile::ValidxKeyEncoder key_encoder = [] (const void* value, faststring* buffer) { > worth a comment here explaining why we can truncate the index and why we ne Done http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc@117 PS2, Line 117: writer_.reset(new cfile::CFileWriter(std::move(opts), > warning: std::move of the variable 'opts' of the trivially-copyable type 'c Done -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 2 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 12 Jan 2018 18:46:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8982 to look at the new patch set (#3). Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is a pair of variable-length integers. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/multi_column_writer.cc 8 files changed, 67 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/3 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 3 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc@107 PS2, Line 107: opts.optimize_index_keys = false; did you consider putting the function in WriterOptions instead of a new parameter? http://gerrit.cloudera.org:8080/#/c/8982/2/src/kudu/tablet/deltafile.cc@109 PS2, Line 109: cfile::ValidxKeyEncoder key_encoder = [] (const void* value, faststring* buffer) { worth a comment here explaining why we can truncate the index and why we need the whole DeltaKey -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 2 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 12 Jan 2018 00:53:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 2: New version should be backwards compatible with previous server versions, so no cfile flag needed. -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 2 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 12 Jan 2018 00:44:50 + Gerrit-HasComments: No
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Hello Will Berkeley, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8982 to look at the new patch set (#2). Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Unfortunately, fixing the issue is not quite as simple as enabling the optimization for deltafiles, since in the normal course of seeking through deltafiles during a scan, we deserialze the value index keys into a DeltaKey. If the values are truncated this deserialization step can fail. Instead, this patch adds overridable value index key encoding to CFileWriter, and delta file overrides it to only encode the delta key, which is usually very short, and a maximum of ~18 bytes. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/tablet/deltafile.cc 3 files changed, 40 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/2 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 2 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8982 ) Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. Patch Set 1: The failures are known issues from another patch series, however it does appear there's a bug in this patch. Setting the flag to true by default makes compaction-test CHECK fail. I'm looking into it. -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 1 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 09 Jan 2018 22:46:38 + Gerrit-HasComments: No
[kudu-CR] KUDU-2253 Deltafile on-disk size is 3x larger than expected
Hello Will Berkeley, Grant Henke, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/8982 to review the following change. Change subject: KUDU-2253 Deltafile on-disk size is 3x larger than expected .. KUDU-2253 Deltafile on-disk size is 3x larger than expected While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example, YCSB). Enabling the optimization changes the on-disk format of delta files, so we have to proceed in steps. This commit enables deltafile reader compatibility with the optimization, but doesn't yet default to using it while writing delta files. A new experimental flag, deltafile_optimize_index_keys controls whether to write deltafiles with the optimization. We should change the default to true after a waiting a minimum of one release, in order to allow downgrading Kudu one minor release. Testing: I've added basic forwards/backwards compatibility tests. I plan to add a more intensive test of the optimization as part of the integration test in KUDU-2251. Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c --- M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h 5 files changed, 63 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/8982/1 -- To view, visit http://gerrit.cloudera.org:8080/8982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4cea3371fcf57f89fe10a3b9262bc152023cb04c Gerrit-Change-Number: 8982 Gerrit-PatchSet: 1 Gerrit-Owner: Dan BurkertGerrit-Reviewer: Grant Henke Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley