Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11605 )

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@88
PS2, Line 88: so the quality
            :   // score, which should be
> nit: "so the quality score should be"
Boy that comment was a mess. I rekajiggered the punctuation and wording.


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@117
PS2, Line 117: // Similar to the above, but with some small overlap between 
adjacent
             : // rowsets.
> In compaction-policy parlance, this means is testing compaction on a "short
The max height is 2 in this case, and 3 in significant overlap, and is not the 
main difference between these tests. This test is meant to test that we don't 
rowset compact if there's only a small overlap, like the above test is that we 
don't compact when there's none, and the below test is that we do compact when 
there's a lot.


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@131
PS2, Line 131: %09d", i * 10000),
             :         StringPrintf("%09d
> Was this change important?
The point of it was to make the comment fit better before I decided to write 
the integers and not the fully-padded strings. But I left it since it's still 
correct, and less arbitrary because it's exactly the length needed (9999 * 
10000 + 11000 = 100001000 has 9 digits).


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@178
PS2, Line 178: the
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@185
PS2, Line 185: // compacting {A, B, C} results in the same quality score as {A, 
B}, 0.67, but
             : // uses more budget. By penalizing the wider solution {A, B, C}, 
we ensure we
             : // don't waste I/O.
> Not really related to this change, but thought I'd ask. Conceptually, I get
Check out docs/design-docs/compaction-policy.md. I don't think it uses the word 
'quality' like the code does but it describes the ideas. The formula for the 
quality of a rowset compaction of a set of rowsets X is

    quality(X) = (sum of widths of rowsets in X) - (width of the union of the 
rowsets in X)

and the width of a rowset (or any set of keys) is the probability that a key 
falls between the set's min key and max key.

The probability distribution we use is one that weights the keyspace by how 
much data there is. So if [k_0, k_1] and [k_2, k_3] are key intervals, and 
[k_0, k_1] has 2x as much data in it, the width of [k_0, k_1] is 2x that of 
[k_2, k_3].

I think Todd and a guy who left a long, long time ago were the experts on this, 
so I'm hoping to bootcamp a few of us into familiarity using these smaller 
patches, before I try to make some larger changes to the compaction policy.



--
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 09 Oct 2018 15:29:02 +0000
Gerrit-HasComments: Yes

Reply via email to