Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16777 )

Change subject: KUDU-3108: fix invalid memory accesses in merge iterator
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16777/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16777/6//COMMIT_MSG@74
PS6, Line 74: This still incurs a call to set::find(), but at least it can
            : correctly erase() with the underlying RowBlock destroyed.
> I'm a bit surprised to read that erase with a iterator position would invol
To clarify, the erase() isn't using find() under the hood -- rather, we need to 
use find() to get the iterator position up front, and then call erase() on what 
we found if we've decided the iterator is no longer hot.


http://gerrit.cloudera.org:8080/#/c/16777/6/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16777/6/src/kudu/common/generic_iterators.cc@556
PS6, Line 556:   typedef std::set<MergeIterState*, 
MergeIterStateSmallestMaxComparator> MergeStateOrderedSet;
> Since we plan to backport this fix then it'd be better to use backward comp
I tried getting this working with boost::intrusive::set<>, but ran into issues 
that I don't fully understand yet (segfault in find()). The implementation also 
isn't pretty, since we are using boost::intrusive on pointers rather than 
objects. Regardless, I don't think it would circumvent the extra call find(), 
so I'm unsure of how much faster this will actually make things.

I've also tried going with Todd's other suggestion of moving forward with the 
two-heap variant. Adar mentioned it to me on Slack, and the algorithmic change 
boils down to comparing hot heap candidates against hot_.top()->last_row() 
instead of hotmaxes_.top()->last_row(). Initial findings show this to be much 
closer to the original perf numbers for the 1000-list overlapping case (first 
run is 10.056s).

I'm inclined to go with the two-heap variant for now, and once absl lands, 
reintroducing the hotmaxes ordered set once we can pull in absl::btree.


http://gerrit.cloudera.org:8080/#/c/16777/6/src/kudu/common/generic_iterators.cc@690
PS6, Line 690:   CHECK(hotmax_iter != hotmaxes_.end());
> CHECK_NE?
It's not a great fit for iterator types, since it requires defining an ostream 
operator.



--
To view, visit http://gerrit.cloudera.org:8080/16777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ec1cd3fd67ec4ea92a55b5b0ce555123748824d
Gerrit-Change-Number: 16777
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Lieber-Dembo <a...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Thu, 26 Nov 2020 00:18:16 +0000
Gerrit-HasComments: Yes

Reply via email to