Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15995 )
Change subject: [maintenance] use workload statistics to scale perf_improvement ...................................................................... Patch Set 1: (3 comments) I haven't done a full review; just leaving my initial concern. http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.h File src/kudu/tablet/tablet_mm_ops.h: http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.h@76 PS1, Line 76: Stopwatch Perhaps use MonoDelta for this instead? Or do we need the CPU and everything else included in the Stopwatch? http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.cc File src/kudu/tablet/tablet_mm_ops.cc: http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_mm_ops.cc@156 PS1, Line 156: uint64_t rows_scanned = metrics->scanner_rows_scanned->value() - last_rows_scanned_; It seems odd to be mixing the units "rows scanned" and "key lookups per op". Similar feedback for the other compaction types. While "rows scanned" does measure activity, it also scales as the dataset grows, meaning larger tablets will tend to look more active. We should also probably think about what scan rate warrants us calling a tablet "active" for scans, and make sure the below values agree with that assessment. Ignoring writes for a second, here we are saying that any tablet that scans more than 1000 rows per second is considered very active. I'm not sure whether rows per second is better than, e.g. number of scanners started, or number of scanners started per second. "key lookups per op" also does measure activity, but it is more of a measure of rowset overlap (e.g. if you have 10 overlapping rowsets, an insert op to the overlapped keyspace may check 10 key indexes for that op). This scales as the batch size grows. That said, I'm not sure why write activity should result in more compaction. Can you further describe the desired performance characteristics of your workloads that you're trying to improve? http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_replica_mm_ops.cc File src/kudu/tablet/tablet_replica_mm_ops.cc: http://gerrit.cloudera.org:8080/#/c/15995/1/src/kudu/tablet/tablet_replica_mm_ops.cc@132 PS1, Line 132: // Use read/write rate that has been since the last flush op performs : // to update perf_improvement. The perf score will be improved depend on : // how 'hot' the tablet was, and will be improved by at most 1.0 because : // we want all tablets that have very large read/write traffic have equal : // opportunity to perfom flush. I'm not sure I understand the rationale behind flushing "hot" MRSs. If they're hot, isn't it better to keep them in memory? Or is the idea that the MRS isn't as efficient as reading the DRS from the block cache? Also, if there's non-zero bytes in the MRS, doesn't that indicate that it's hot? I'm also somewhat concerned about flush starvation. Does this make it possible that cold MRSs will never flush, and will anchor RAM and WALs indefinitely? -- To view, visit http://gerrit.cloudera.org:8080/15995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696 Gerrit-Change-Number: 15995 Gerrit-PatchSet: 1 Gerrit-Owner: Yifan Zhang <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 29 May 2020 05:59:41 +0000 Gerrit-HasComments: Yes
