Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15995 )

Change subject: [maintenance] use workload statistics to scale perf_improvement
......................................................................


Patch Set 1:

(3 comments)

I haven't done a full review; just leaving my initial concern.

http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.h
File src/kudu/tablet/tablet_mm_ops.h:

http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.h@76
PS1, Line 76: Stopwatch
Perhaps use MonoDelta for this instead? Or do we need the CPU and everything 
else included in the Stopwatch?


http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.cc
File src/kudu/tablet/tablet_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.cc@156
PS1, Line 156:     uint64_t rows_scanned = 
metrics->scanner_rows_scanned->value() - last_rows_scanned_;
It seems odd to be mixing the units "rows scanned" and "key lookups per op". 
Similar feedback for the other compaction types.

While "rows scanned" does measure activity, it also scales as the dataset 
grows, meaning larger tablets will tend to look more active. We should also 
probably think about what scan rate warrants us calling a tablet "active" for 
scans, and make sure the below values agree with that assessment. Ignoring 
writes for a second, here we are saying that any tablet that scans more than 
1000 rows per second is considered very active. I'm not sure whether rows per 
second is better than, e.g. number of scanners started, or number of scanners 
started per second.

"key lookups per op" also does measure activity, but it is more of a measure of 
rowset overlap (e.g. if you have 10 overlapping rowsets, an insert op to the 
overlapped keyspace may check 10 key indexes for that op). This scales as the 
batch size grows. That said, I'm not sure why write activity should result in 
more compaction. Can you further describe the desired performance 
characteristics of your workloads that you're trying to improve?


http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_replica_mm_ops.cc@132
PS1, Line 132:     // Use read/write rate that has been since the last flush op 
performs
             :     // to update perf_improvement. The perf score will be 
improved depend on
             :     // how 'hot' the tablet was, and will be improved by at most 
1.0 because
             :     // we want all tablets that have very large read/write 
traffic have equal
             :     // opportunity to perfom flush.
I'm not sure I understand the rationale behind flushing "hot" MRSs. If they're 
hot, isn't it better to keep them in memory? Or is the idea that the MRS isn't 
as efficient as reading the DRS from the block cache? Also, if there's non-zero 
bytes in the MRS, doesn't that indicate that it's hot?

I'm also somewhat concerned about flush starvation. Does this make it possible 
that cold MRSs will never flush, and will anchor RAM and WALs indefinitely?



--
To view, visit http://gerrit.cloudera.org:8080/15995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
Gerrit-Change-Number: 15995
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 29 May 2020 05:59:41 +0000
Gerrit-HasComments: Yes

Reply via email to