Adar Dembo has posted comments on this change. ( )

Change subject: KUDU-2645: tablet: Support deduplication of deleted rows in 

Patch Set 3:

Commit Message:
PS3, Line 19: Because of this new behavior, while the MergeIterator will 
            : deleted rows if they are included in the result set, the 
            : will not deduplicate them and instead return all instances found.
I might reword this to state that it's not possible to efficiently support this 
deduplication _without_ a merge-like process, so there's no reason to extend it 
to the UnionIterator.
PS3, Line 22:
            : One tangentially-related change in this patch is that TestMerge in
            : was modified to no longer generate 
            : non-deleted keys for merge testing
This makes sense given what you're trying to do, but it's a bit of a bummer in 
that it makes the MergeIterator less "generic".
File src/kudu/common/

Could you add a test case for include_deleted_rows=true that uses these simple 
VectorIterators? That'll help assess the impact of deduplication.

Separately, could you assess the impact of the MergeIterator change when 
!include_deleted_rows? See my dominance patch for some ideas on how to 
PS3, Line 232:       seen.insert(potential);
Nit: InsertOrDie. Or EmplaceOrDie (if it works).
File src/kudu/common/generic_iterators.h:
PS3, Line 38: schema
Nit: schemas (since this is a requirement for every sub-iterator).
File src/kudu/common/
PS3, Line 300:     : opts_(opts),
PS3, Line 419:   vector<MergeIterState*> smallest(states_.size());
Can we micro-optimize the !include_deleted_rows case such that the operations 
(and allocation) of the vector are avoided?
PS3, Line 432:     for (const auto& iter : states_) {
There's a non-obvious state machine in here; could you document it?
PS3, Line 439:         smallest = { iter.get() };
Is this is a copy assignment? For the common case where there are no 
duplicates, is this as efficient as overwriting the first element?
PS3, Line 441:         smallest.push_back(iter.get());
Nit: emplace_back for new code.
PS3, Line 465:           // We found the single live instance of the row.
In DEBUG builds, should we verify that any remaining instances are ghosts?
PS3, Line 479:       RETURN_NOT_OK(CopyRow(row_to_return->next_row(), &dst_row, 
I think this can be factored out of both sides of the if statement if you 
declare row_to_return on L448 and set it appropriately in the 
!include_deleted_rows case.
PS3, Line 487:     // Remove exhausted sub-iterators from the list we merge 
Any reason not to combine this loop with the one that advances the 
sub-iterators just above?
PS3, Line 504:
             :   // Maintain a record of the number of rows actually copied to 
the destination
             :   // RowBlock.
I think this is mostly self-explanatory from the code. What would be 
interesting to document, though, is that the possibility of deduplication is 
what makes the previous dst->Resize() call in PrepareBatch insufficient.
PS3, Line 536:   return unique_ptr<RowwiseIterator>(new MergeIterator(opts, 
File src/kudu/tablet/
PS3, Line 61:   ASSERT_TRUE(order_mode == UNORDERED || order_mode == ORDERED)
            :       << "unknown order mode: " << OrderMode_Name(order_mode);
Doesn't seem necessary to me; the list of values is just a few lines above.
PS3, Line 97:   opts.include_deleted_rows = true;
Would be cool to parameterize on this too.
File src/kudu/tablet/
PS3, Line 105:     case ORDERED: {
Why do you need the new scope here? Same for

To view, visit
To unsubscribe, visit

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00614b3fa5c6b4e7b620bb78489e24c5ad44daee
Gerrit-Change-Number: 12205
Gerrit-PatchSet: 3
Gerrit-Owner: Mike Percy <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <>
Gerrit-Comment-Date: Wed, 13 Feb 2019 05:22:37 +0000
Gerrit-HasComments: Yes

Reply via email to