Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11869 )
Change subject: [compaction] KUDU-1400: Improve rowset compaction policy to consider merging small DRSs ...................................................................... Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy-test.cc File src/kudu/tablet/compaction_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy-test.cc@344 PS4, Line 344: ASSERT_TRUE(picked.empty()); : ASSERT_EQ(quality, 0.0); nit: think it's worth using EXPECT* here so we can run over all rowsets, even upon failure? Same for other tests with loops. http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy-test.cc@381 PS4, Line 381: if (divisor > 2) { : ASSERT_EQ(rowsets.size(), picked.size()); > Should it also hold if the divisor is 3 in case of 3 rowsets? Or it doesn' I think that isn't what we want. As the number of rowsets increases, the (Ni-Nf) would increase the score. Take the example of 100 non-overlapping rowsets: we would definitely want to compact, right? http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy.cc File src/kudu/tablet/compaction_policy.cc: http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy.cc@268 PS4, Line 268: double MaybePenalizeWideSolution(double sum_width, double union_width) { : return sum_width <= union_width ? sum_width - union_width : : sum_width - kSupportAdjust * union_width; : } This is worth a comment. My attempt: In cases where we are reducing the overlap, we try to favor narrower solutions, lest we attempt to include an excessive number of rowsets for the same overlap-reduction benefit. To do so, we tweak the calculation to give narrow solutions a larger score. http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.h File src/kudu/tablet/rowset_info.h: http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.h@87 PS4, Line 87: by this RowSet nit: not your fault, but mind tweaking this to be "is spanned by the min/max bounds of this RowSet"? Right now, this comment seems a little nuanced, and worth clarifying. http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc File src/kudu/tablet/rowset_info.cc: http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc@65 PS4, Line 65: compaction_small_rowset_tradeoff > nit: is it worth marking this flag with the 'runtime' tag? +1, same above? http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc@90 PS4, Line 90: --compaction_small_rowset_tradeoff=$0 must be less than " : "--compaction_minimum_improvement=$1 in order to prevent pointless " : "compactions; if you know what you are doing, pass " : "--compaction_force_small_rowset_tradeoff to permit this", nit: consistent -flag_dashes with the above flag? http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc@230 PS4, Line 230: const auto size_score = : 1 - static_cast<double>(size_bytes) / target_size_bytes; I understand the sentiment behind wanting to just point to a design doc for this, but I think it'd be really helpful to at least give a one-two line comment about this. This is an approximation for the expected reduction in rowset count that the given rowset will provide, and I don't think that idea is captured by calling it a "size_score". Also, in the doc, `alpha` is `gamma`. Mind keeping that consistent here? It's kind of confusing otherwise. -- To view, visit http://gerrit.cloudera.org:8080/11869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b421c6ed77d28ebab9b91a4d6fcb1e825997e6c Gerrit-Change-Number: 11869 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 14 Nov 2018 20:13:44 +0000 Gerrit-HasComments: Yes
