Todd Lipcon has posted comments on this change.

Change subject: KUDU-1398 CFile index blocks can store shortest separating 

Patch Set 4:

Commit Message:

PS4, Line 17: compatibility with bloomfile
what's this mean?
File src/kudu/cfile/

Line 131:     last_key_.assign_copy(last->data(), last->size());
instead of copying this every time, could we use GetLast on the code word 
builder, then look it up in our dictionary? What you have here will probably 
hurt write performance
File src/kudu/cfile/

Line 132: Status BinaryPlainBlockBuilder::GetLastKey(void *key_void) const {
can we refactor this to share code with GetFirstKey? eg a GetKeyAtIndex(int 
File src/kudu/cfile/

Line 126: void SeparatingKey(const faststring& left, const faststring& right, 
faststring *target) {
can you add some unit tests for this function? (eg the examples from the commit 

Line 127:   DCHECK(memcmp(,, min(left.size(), 
right.size())) <=
I think DCHECK_LE(Slice(left).compare(Slice(right), 0); is probably more 
readable. Especially if you change this function to take slices as arguments

Line 130:   if (left.size() == 0) {

Line 131:     // special case for first block: use the whole key
not sure why this is the case

Line 135:     target->assign_copy(, cpl == right.size() ? cpl : cpl 
+ 1);
hm, this is a slightly different algorithm than the one used in 
gutil/strings/util.h (FindShortestSeparator). eg their example:

  // FindShortestSeparator("foobar", "foxhunt", &sep) => sep == "fop"

you would return 'fox', right? Either one is the same length, but I guess 
theirs ends up tighter to the left bound whereas yours ends up somewhere in the 
middle. Can you think of any advantages/disadvantages between the two? As far 
as I can think, either is correct, but maybe I'm missing something.
File src/kudu/cfile/cfile_util.h:

Line 63:   bool deltafile;
I think I'd prefer this be called something like 'optimize_index_keys' or 
something, rather than specifically referring to whether it's a deltafile.

Another option is to have an int 'preserve_index_key_prefix_length' or 
somesuch, where for delta keys we pass the maximum possible encoded length of a 
delta key. This ensures that we can still decode it, but we might still be able 
to truncate the value (since the delta entries themselves can be arbitrarily 
long with user data)

Line 110: // Computes the shortest key satisfying left < key < right and places 
it into target.
should clarify whether 'target' may be the same faststring as 'left' or 'right' 
or not

Line 111: void SeparatingKey(const faststring& left, const faststring& right, 
faststring *target);
rename to GetSeparatingKey

PS4, Line 111: const faststring& left, const faststring& right
may make sense to use StringPiece or Slice here rather than specifically 
File src/kudu/cfile/

Line 141:     last_key_.clear();
no need (it's initialized to clear)

Line 437:       SeparatingKey(last_key_, tmp_buf_, &sep_buf);
maybe make SeparatingKey actually just shorten tmp_buf_ in place to avoid the 
extra allocation here
File src/kudu/cfile/rle_block.h:

Line 265:     if (count_ > 0) {
usually we put the error case first

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-Reviewer: Will Berkeley <>
Gerrit-HasComments: Yes

Reply via email to