Hello Kudu Jenkins, Andrew Wong,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15995
to look at the new patch set (#7).
Change subject: [maintenance] use workload statistics to scale perf_improvement
......................................................................
[maintenance] use workload statistics to scale perf_improvement
When we consider the performance improvement brought by maintenance
operations, we could use workload statistics to find how 'hot' the
tablet has been in the last few minutes and perform maintenance ops
for 'hot' tablets in priority. This patch use recent read/write rate
of a tablet as a workload score, calculate a final perf score based on
a op's raw perf_improvement, the tablet's workload score and the table's
priority, so maintenance ops for a 'hotter' tablet are more likely to lauch.
In our usercases, there is insert/update/delete traffic all the time,
but some tables may have more read traffic at some time, so we want to
dynamically adjust priorities of compaction/flush ops for different tables.
We tested this on a 6-node cluster and set maintenance_manager_num_threads=1
for tservers, run three YCSB workloads in order on two tables with 64 tablets.
We run almost the same workload on two tables at the same time, except for
different threadcount/recordcount/operationcount settings, in order to implement
different read/write rates and runtime for these two tables.
worklaod_a: insert only workload.
threadcount=1(table-A)/16(table-B)
recordcount=50,000,000(table-A)/1,000,000,000(table-B)
result:
measurement Before change After change
[table-A:INSERT]AverageLatency(us) 62.48352 61.45316
[table-A:INSERT]95thPercentileLatency(us) 4 4
[table-B:INSERT]AverageLatency(us) 58.05314 56.55963
[table-B:INSERT]95thPercentileLatency(us) 6 6
workload_b: scan mostly worklaod, scan/update ratio is 80/20.
threadcount=1(table-A)/16(table-B)
operationcount=500,000(table-A)/10,000,000(table-B)
requestdistribution=zipfian
scanlengthdistribution=zipfian
maxscanlength=100
readallfields=false
result:
measurement Before change After change
[table-A:UPDATE]AverageLatency(us) 6.73773 5.58511
[table-A:UPDATE]95thPercentileLatency(us) 15 11
[table-A:SCAN]AverageLatency(us) 834.47093 481.40569(-42%)
[table-A:SCAN]95thPercentileLatency(us) 639 479(-25%)
[table-B:UPDATE]AverageLatency(us) 4.61783 4.58399
[table-B:UPDATE]95thPercentileLatency(us) 7 7
[table-B:SCAN]AverageLatency(us) 2168.55291 1979.14102(-8%)
[table-B:SCAN]95thPercentileLatency(us) 7727 4671(-39%)
workload_c: insert heavy workload, scan/insert ratio is 20/80.
threadcount=1(table-A)/16(table-B)
operationcount=5,000,000(table-A)/100,000,000(table-B)
insertorder=hashed
requestdistribution=zipfian
scanlengthdistribution=zipfian
maxscanlength=100
readallfields=false
result:
measurements Before change After change
[table-A:INSERT]AverageLatency(us) 7.89913 7.90852
[table-A:INSERT]95thPercentileLatency(us) 9 7
[table-A:SCAN]AverageLatency(us) 1617.92456 960.75466(-40%)
[table-A:SCAN]95thPercentileLatency(us) 9871 2073(-78%)
[table-B:INSERT]AverageLatency(us) 9.11269 9.56165
[table-B:INSERT]95thPercentileLatency(us) 8 9
[table-B:SCAN]AverageLatency(us) 1392.94679
1316.02913(-5.4%)
[table-B:SCAN]95thPercentileLatency(us) 3665 3059-(16.5%)
We can see that with this change we can improve scan performance in above test
cases.
And this path will not lead to a compaction/flush starvation of low read/write
tablets
because workload_score is at most 1, we can check this on tablet servers'
dashboard/
maintenance-manager page.
Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
9 files changed, 165 insertions(+), 21 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/15995/7
--
To view, visit http://gerrit.cloudera.org:8080/15995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
Gerrit-Change-Number: 15995
Gerrit-PatchSet: 7
Gerrit-Owner: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <[email protected]>