Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16777 )
Change subject: KUDU-3108: fix invalid memory accesses in merge iterator ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/16777/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16777/6//COMMIT_MSG@74 PS6, Line 74: This still incurs a call to set::find(), but at least it can : correctly erase() with the underlying RowBlock destroyed. > I'm a bit surprised to read that erase with a iterator position would invol To clarify, the erase() isn't using find() under the hood -- rather, we need to use find() to get the iterator position up front, and then call erase() on what we found if we've decided the iterator is no longer hot. http://gerrit.cloudera.org:8080/#/c/16777/6/src/kudu/common/generic_iterators.cc File src/kudu/common/generic_iterators.cc: http://gerrit.cloudera.org:8080/#/c/16777/6/src/kudu/common/generic_iterators.cc@556 PS6, Line 556: typedef std::set<MergeIterState*, MergeIterStateSmallestMaxComparator> MergeStateOrderedSet; > Since we plan to backport this fix then it'd be better to use backward comp I tried getting this working with boost::intrusive::set<>, but ran into issues that I don't fully understand yet (segfault in find()). The implementation also isn't pretty, since we are using boost::intrusive on pointers rather than objects. Regardless, I don't think it would circumvent the extra call find(), so I'm unsure of how much faster this will actually make things. I've also tried going with Todd's other suggestion of moving forward with the two-heap variant. Adar mentioned it to me on Slack, and the algorithmic change boils down to comparing hot heap candidates against hot_.top()->last_row() instead of hotmaxes_.top()->last_row(). Initial findings show this to be much closer to the original perf numbers for the 1000-list overlapping case (first run is 10.056s). I'm inclined to go with the two-heap variant for now, and once absl lands, reintroducing the hotmaxes ordered set once we can pull in absl::btree. http://gerrit.cloudera.org:8080/#/c/16777/6/src/kudu/common/generic_iterators.cc@690 PS6, Line 690: CHECK(hotmax_iter != hotmaxes_.end()); > CHECK_NE? It's not a great fit for iterator types, since it requires defining an ostream operator. -- To view, visit http://gerrit.cloudera.org:8080/16777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ec1cd3fd67ec4ea92a55b5b0ce555123748824d Gerrit-Change-Number: 16777 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Lieber-Dembo <a...@apache.org> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Thu, 26 Nov 2020 00:18:16 +0000 Gerrit-HasComments: Yes