Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11605 )
Change subject: [compaction] Small improvements to compaction policy tests ...................................................................... Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc File src/kudu/tablet/compaction_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@88 PS2, Line 88: so the quality : // score, which should be > nit: "so the quality score should be" Boy that comment was a mess. I rekajiggered the punctuation and wording. http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@117 PS2, Line 117: // Similar to the above, but with some small overlap between adjacent : // rowsets. > In compaction-policy parlance, this means is testing compaction on a "short The max height is 2 in this case, and 3 in significant overlap, and is not the main difference between these tests. This test is meant to test that we don't rowset compact if there's only a small overlap, like the above test is that we don't compact when there's none, and the below test is that we do compact when there's a lot. http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@131 PS2, Line 131: %09d", i * 10000), : StringPrintf("%09d > Was this change important? The point of it was to make the comment fit better before I decided to write the integers and not the fully-padded strings. But I left it since it's still correct, and less arbitrary because it's exactly the length needed (9999 * 10000 + 11000 = 100001000 has 9 digits). http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@178 PS2, Line 178: the > nit: remove Done http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@185 PS2, Line 185: // compacting {A, B, C} results in the same quality score as {A, B}, 0.67, but : // uses more budget. By penalizing the wider solution {A, B, C}, we ensure we : // don't waste I/O. > Not really related to this change, but thought I'd ask. Conceptually, I get Check out docs/design-docs/compaction-policy.md. I don't think it uses the word 'quality' like the code does but it describes the ideas. The formula for the quality of a rowset compaction of a set of rowsets X is quality(X) = (sum of widths of rowsets in X) - (width of the union of the rowsets in X) and the width of a rowset (or any set of keys) is the probability that a key falls between the set's min key and max key. The probability distribution we use is one that weights the keyspace by how much data there is. So if [k_0, k_1] and [k_2, k_3] are key intervals, and [k_0, k_1] has 2x as much data in it, the width of [k_0, k_1] is 2x that of [k_2, k_3]. I think Todd and a guy who left a long, long time ago were the experts on this, so I'm hoping to bootcamp a few of us into familiarity using these smaller patches, before I try to make some larger changes to the compaction policy. -- To view, visit http://gerrit.cloudera.org:8080/11605 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3 Gerrit-Change-Number: 11605 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 09 Oct 2018 15:29:02 +0000 Gerrit-HasComments: Yes
