[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Reviewed-on: http://gerrit.cloudera.org:8080/3304 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 11: Build Started http://104.196.14.100/job/kudu-gerrit/1973/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#10). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/10 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: Looks like a small compile error here -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#9). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/9 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: Build Started http://104.196.14.100/job/kudu-gerrit/1927/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 8: Build Started http://104.196.14.100/job/kudu-gerrit/1870/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 6: > Sorry for the delay on this, Will. I was on PTO the last couple of > weeks. Hope to get back to review this week. I noticed the last > build failed but the build results have been purged by this point. > Is the current rev passing tests, etc? No worries Mr Lipcon. I hope you enjoyed your vacation. IIRC it passes everything, but on the last build Jenkins choked somehow and failed the build through no fault on my own. I'll verify it passes the basic stuff locally and repush so Jenkins can (hopefully) verify. Unrelatedly, I have a patch in the works for KUDU-1227 (update-merge). It works and passes the tests + the tests I wrote for it, but definitely needs some review love. Is there interest in that for 1.0? I mostly did it for personal enrichment but obv happy to contribute it if it's wanted. -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 4: Sorry for the delay on this, Will. I was on PTO the last couple of weeks. Hope to get back to review this week. I noticed the last build failed but the build results have been purged by this point. Is the current rev passing tests, etc? -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/1787/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#5). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 20 files changed, 206 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/5 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 5: Build Started http://104.196.14.100/job/kudu-gerrit/1786/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/3304/4//COMMIT_MSG Commit Message: PS4, Line 17: compatibility with bloomfile what's this mean? http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_dict_block.cc File src/kudu/cfile/binary_dict_block.cc: Line 131: last_key_.assign_copy(last->data(), last->size()); instead of copying this every time, could we use GetLast on the code word builder, then look it up in our dictionary? What you have here will probably hurt write performance http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/binary_plain_block.cc File src/kudu/cfile/binary_plain_block.cc: Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const { can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int idx)? http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.cc File src/kudu/cfile/cfile_util.cc: Line 126: void SeparatingKey(const faststring& left, const faststring& right, faststring *target) { can you add some unit tests for this function? (eg the examples from the commit message) Line 127: DCHECK(memcmp(left.data(), right.data(), min(left.size(), right.size())) <= I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more readable. Especially if you change this function to take slices as arguments Line 130: if (left.size() == 0) { .empty() Line 131: // special case for first block: use the whole key not sure why this is the case Line 135: target->assign_copy(right.data(), cpl == right.size() ? cpl : cpl + 1); hm, this is a slightly different algorithm than the one used in gutil/strings/util.h (FindShortestSeparator). eg their example: // FindShortestSeparator("foobar", "foxhunt", ) => sep == "fop" you would return 'fox', right? Either one is the same length, but I guess theirs ends up tighter to the left bound whereas yours ends up somewhere in the middle. Can you think of any advantages/disadvantages between the two? As far as I can think, either is correct, but maybe I'm missing something. http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_util.h File src/kudu/cfile/cfile_util.h: Line 63: bool deltafile; I think I'd prefer this be called something like 'optimize_index_keys' or something, rather than specifically referring to whether it's a deltafile. Another option is to have an int 'preserve_index_key_prefix_length' or somesuch, where for delta keys we pass the maximum possible encoded length of a delta key. This ensures that we can still decode it, but we might still be able to truncate the value (since the delta entries themselves can be arbitrarily long with user data) Line 110: // Computes the shortest key satisfying left < key < right and places it into target. should clarify whether 'target' may be the same faststring as 'left' or 'right' or not Line 111: void SeparatingKey(const faststring& left, const faststring& right, faststring *target); rename to GetSeparatingKey PS4, Line 111: const faststring& left, const faststring& right may make sense to use StringPiece or Slice here rather than specifically faststring http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/cfile_writer.cc File src/kudu/cfile/cfile_writer.cc: Line 141: last_key_.clear(); no need (it's initialized to clear) Line 437: SeparatingKey(last_key_, tmp_buf_, _buf); maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid the extra allocation here http://gerrit.cloudera.org:8080/#/c/3304/4/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 265: if (count_ > 0) { usually we put the error case first -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes