Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11869 )

Change subject: [compaction] KUDU-1400: Improve rowset compaction policy to 
consider merging small DRSs
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy-test.cc@344
PS4, Line 344: ASSERT_TRUE(picked.empty());
             :     ASSERT_EQ(quality, 0.0);
nit: think it's worth using EXPECT* here so we can run over all rowsets, even 
upon failure? Same for other tests with loops.


http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy-test.cc@381
PS4, Line 381:     if (divisor > 2) {
             :       ASSERT_EQ(rowsets.size(), picked.size());
> Should it also hold if the divisor is 3 in case of 3 rowsets?  Or it doesn'
I think that isn't what we want. As the number of rowsets increases, the 
(Ni-Nf) would increase the score. Take the example of 100 non-overlapping 
rowsets: we would definitely want to compact, right?


http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy.cc
File src/kudu/tablet/compaction_policy.cc:

http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/compaction_policy.cc@268
PS4, Line 268: double MaybePenalizeWideSolution(double sum_width, double 
union_width) {
             :   return sum_width <= union_width ? sum_width - union_width :
             :                                     sum_width - kSupportAdjust * 
union_width;
             : }
This is worth a comment. My attempt:

In cases where we are reducing the overlap, we try to favor narrower solutions, 
lest we attempt to include an excessive number of rowsets for the same 
overlap-reduction benefit. To do so, we tweak the calculation to give narrow 
solutions a larger score.


http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:

http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.h@87
PS4, Line 87: by this RowSet
nit: not your fault, but mind tweaking this to be "is spanned by the min/max 
bounds of this RowSet"? Right now, this comment seems a little nuanced, and 
worth clarifying.


http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc
File src/kudu/tablet/rowset_info.cc:

http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc@65
PS4, Line 65: compaction_small_rowset_tradeoff
> nit: is it worth marking this flag with the 'runtime' tag?
+1, same above?


http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc@90
PS4, Line 90: --compaction_small_rowset_tradeoff=$0 must be less than "
            :         "--compaction_minimum_improvement=$1 in order to prevent 
pointless "
            :         "compactions; if you know what you are doing, pass "
            :         "--compaction_force_small_rowset_tradeoff to permit this",
nit: consistent -flag_dashes with the above flag?


http://gerrit.cloudera.org:8080/#/c/11869/4/src/kudu/tablet/rowset_info.cc@230
PS4, Line 230:   const auto size_score =
             :       1 - static_cast<double>(size_bytes) / target_size_bytes;
I understand the sentiment behind wanting to just point to a design doc for 
this, but I think it'd be really helpful to at least give a one-two line 
comment about this. This is an approximation for the expected reduction in 
rowset count that the given rowset will provide, and I don't think that idea is 
captured by calling it a "size_score".

Also, in the doc, `alpha` is `gamma`. Mind keeping that consistent here? It's 
kind of confusing otherwise.



--
To view, visit http://gerrit.cloudera.org:8080/11869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b421c6ed77d28ebab9b91a4d6fcb1e825997e6c
Gerrit-Change-Number: 11869
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Wed, 14 Nov 2018 20:13:44 +0000
Gerrit-HasComments: Yes

Reply via email to