Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12205 )

Change subject: KUDU-2645: tablet: Support deduplication of deleted rows in 
MergeIterator
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc@441
PS3, Line 441:         // If we have a candidate for smallest row, compare it 
against the
> Is there a reason to prefer emplace_back() over push_back() in general? I t
My take is that if I use emplace_back(), I don't have to think about whether I 
can move or copy; it allows both, and without any overhead for copying. So if 
we consistently use emplace_back(), that's just one less thing to think about. 
That's why I started pointing it out in my code reviews (for the past year or 
two).

IIUC, push_back(std::move(foo)) always makes a copy of 'foo' but misleads 
readers into thinking that 'foo' is being moved. Hence the thought to avoid it 
in new code.


http://gerrit.cloudera.org:8080/#/c/12205/3/src/kudu/common/generic_iterators.cc@479
PS3, Line 479:           // We found the single live instance of the row.
> I don't understand this comment.
I think you can deduplicate the two calls to CopyRow (one here and the other on 
L453) into one call just after if (!opts_.include_deleted_rows) { ... } else { 
... }


http://gerrit.cloudera.org:8080/#/c/12205/4/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12205/4/src/kudu/common/generic_iterators.cc@474
PS4, Line 474:       int live_rows_found = 0;
Could you make sure this doesn't trigger a not-used warning in RELEASE builds?


http://gerrit.cloudera.org:8080/#/c/12205/4/src/kudu/common/generic_iterators.cc@503
PS4, Line 503:     unordered_set<MergeIterState*> exhausted;
I think this can be a simple vector; aren't we guaranteed that there are no 
duplicates _within_ a given sub-iterator?

Regardless, I still think it'd be better to avoid this intermediate step and 
combine the two loops. I get that you want to acquire states_lock_ and do 
states_.erase() just once, but consider that if there were duplicates for the 
returned row, the number of duplicates is probably very low, so the overhead of 
multiple lock acquisition/vector erases may be outweighed by the allocation of 
the set/vector here.



--
To view, visit http://gerrit.cloudera.org:8080/12205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00614b3fa5c6b4e7b620bb78489e24c5ad44daee
Gerrit-Change-Number: 12205
Gerrit-PatchSet: 4
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Comment-Date: Thu, 28 Feb 2019 06:11:56 +0000
Gerrit-HasComments: Yes

Reply via email to