Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14912 )
Change subject: [metrics] Add a new metric 'last_consult_timestamp' for tablet ...................................................................... Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/14912/1/src/kudu/tablet/tablet_metrics.cc File src/kudu/tablet/tablet_metrics.cc: http://gerrit.cloudera.org:8080/#/c/14912/1/src/kudu/tablet/tablet_metrics.cc@136 PS1, Line 136: METRIC_DEFINE_gauge_uint64(tablet, last_consult_timestamp, "Last Consult Timestamp", > Does it make sense to separate this into two metrics? One for writes and on Agreed with Grant. We should also say something about the units. Is it milliseconds since the UNIX epoch? http://gerrit.cloudera.org:8080/#/c/14912/1/src/kudu/tablet/tablet_metrics.cc@326 PS1, Line 326: // #define MAXINIT(x) x(METRIC_##x.Instantiate(entity, 0, kMax)) So this speaks to a deeper issue: every tserver has its own clock, and those clocks may not be well-synchronized, either individually with some NTP servers, or with one another. What are your expectations w.r.t. this metric? It seems like you don't want it to "go backwards" if the system clock changes value. Why is that? How do you expect each tserver's value to correlate to other tserver values? An alternative that might be cleaner is, rather than storing a timestamp in the metric, to store, as part of the tablet, a MonoTime corresponding to the last write or last scan. Then, you could expose a FunctionGauge for "last consult time" that'd be calculated as Now - time_of_last_write (or time_of_last_scan). I guess the MonoTime part is orthogonal; you could use time() instead. But there's value in using a MonoTime because it's guaranteed not to drop, so you don't have to answer these trickier questions. http://gerrit.cloudera.org:8080/#/c/14912/1/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/14912/1/src/kudu/tserver/tablet_service.cc@2678 PS1, Line 2678: // First, the number of rows/cells/bytes actually returned to the user. Seems like we could merge this with the code in L2702-2706. It'd be nice to avoid the duplicate checks and cognitive overhead of two separate metric updates. -- To view, visit http://gerrit.cloudera.org:8080/14912 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90738ba90eaa8f78c8d721dde2eed2b723d5572d Gerrit-Change-Number: 14912 Gerrit-PatchSet: 1 Gerrit-Owner: Yingchun Lai <405403...@qq.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 17 Dec 2019 00:05:02 +0000 Gerrit-HasComments: Yes