[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has abandoned this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Abandoned Split into two patches. Tablet state metrics: http://gerrit.cloudera.org:8080/7980 KUDU-2044: http://gerrit.cloudera.org:8080/7981 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Mike Percy has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 4: (1 comment) Overall the new metrics look reasonable. But it seems like this should be written as 2 separate patches, because I'm not sure about the use case for hiding the metric entity per my comment in tablet.cc http://gerrit.cloudera.org:8080/#/c/7618/4/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 277: if (metric_entity_) { Instead of hiding the metric entity, if that is the effect we want, shouldn't we instead destroy and unregister the tablet's metric entity on Shutdown()? -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Todd Lipcon has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7618/2//COMMIT_MSG Commit Message: PS2, Line 11: The numbers are computed by the heartbeater. There's two reasons : for this: : 1. the heartbeater was already computing the number of RUNNING : and BOOTSTRAPPING tablets by holding the appropriate lock and : iterating over the tablet map : 2. the alternative to having some thread periodically iterate : over the tablet map is to increment and decrement the metrics : when tablets transition states. This is error-prone, : particularly if new states are added, and mistakes will : accumulate until the metric is worse than useless. > Done Adar asked me to weigh in. The spinlock and timestamp idea sounds good to me as well (was thinking the same thing before I read his comment) PS2, Line 23: A metric entity can now be : marked as hidden, so it will not print to /metrics, and : tablet metric entities are marked as hidden when the tablet is : tombstoned, and un-hidden if and when the tablet is revived. > Let's solicit another opinion first. Todd understands the metrics subsystem I guess I'm curious how many metrics stay alive for the tombstoned tablets, and consequently how much memory is used for them? We do have a fair number of per-tablet metrics, and in particular histograms use a fair bit of memory each. Since we can assume a substantial number of tombstones per server, I think this memory usage could add up, even if the results are "hidden" during dump time. Do you have that data handy? -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7618 to look at the new patch set (#4). Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Add tablet state summary metrics and fix KUDU-2044 This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). To avoid contention in the tablet manager, the frequency with which the state numbers are computed is limited. Requests that come too soon for an update will receive cached counts. The maximum frequency of updates is controlled by a new flag. It also addresses KUDU-2044: tombstoned tablets will no longer produce metrics in /metrics output. A metric entity can now be marked as hidden, so it will not print to /metrics, and tablet metric entities are marked as hidden when the tablet is tombstoned, and un-hidden if and when the tablet is revived. Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 7 files changed, 307 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7618/4 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7618/2//COMMIT_MSG Commit Message: PS2, Line 11: The numbers are computed by the heartbeater. There's two reasons : for this: : 1. the heartbeater was already computing the number of RUNNING : and BOOTSTRAPPING tablets by holding the appropriate lock and : iterating over the tablet map : 2. the alternative to having some thread periodically iterate : over the tablet map is to increment and decrement the metrics : when tablets transition states. This is error-prone, : particularly if new states are added, and mistakes will : accumulate until the metric is worse than useless. > I'm not that worried about missing some places where calls to TransitionFro Done http://gerrit.cloudera.org:8080/#/c/7618/2/src/kudu/util/metrics.h File src/kudu/util/metrics.h: Line 533: std::atomic hidden_; > The rest of the file uses util/atomic.h for atomics; please do the same her Done -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7618 to look at the new patch set (#3). Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Add tablet state summary metrics and fix KUDU-2044 This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). To avoid contention in the tablet manager, the frequency with which the state numbers are computed is limited. Requests that come too soon for an update will receive cached counts. The maximum frequency of updates is controlled by a new flag. It also addresses KUDU-2044: tombstoned tablets will no longer produce metrics in /metrics output. A metric entity can now be marked as hidden, so it will not print to /metrics, and tablet metric entities are marked as hidden when the tablet is tombstoned, and un-hidden if and when the tablet is revived. Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 7 files changed, 305 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7618/3 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Adar Dembo has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7618/2//COMMIT_MSG Commit Message: PS2, Line 11: The numbers are computed by the heartbeater. There's two reasons : for this: : 1. the heartbeater was already computing the number of RUNNING : and BOOTSTRAPPING tablets by holding the appropriate lock and : iterating over the tablet map : 2. the alternative to having some thread periodically iterate : over the tablet map is to increment and decrement the metrics : when tablets transition states. This is error-prone, : particularly if new states are added, and mistakes will : accumulate until the metric is worse than useless. > I could use a function gauge, but a really fast metric poll could contend w I'm not that worried about missing some places where calls to TransitionFromFooToBar() should go. What I don't like, though, is the additional plumbing that would require. It'd mean a backpointer to the TsTabletManager (with some hack for the master where there is no TsTabletManager), or a more abstract "notify tablet state change observers" pattern. How about this for a middle ground: 1. TsTabletManager maintains a timestamp representing the last time that the state metrics were calculated and a set of counters, one for each state. The timestamp could have a dedicated spinlock, or reuse an existing one if appropriate. 2. There are N function guages, one for each state. 3. When a guage is invoked, the timestamp is compared with the current time. If the delta exceeds some threshold (statically defined or configurable via gflag), the tablet map is walked and all of the counters are updated. If it doesn't, the current counter values are returned as-is. PS2, Line 23: A metric entity can now be : marked as hidden, so it will not print to /metrics, and : tablet metric entities are marked as hidden when the tablet is : tombstoned, and un-hidden if and when the tablet is revived. > They take up some memory as part of the TabletReplica that remains in the t Let's solicit another opinion first. Todd understands the metrics subsystem better than I do, and Mike can comment on the tombstoning/vivification lifecycle. -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7618/2//COMMIT_MSG Commit Message: PS2, Line 11: The numbers are computed by the heartbeater. There's two reasons : for this: : 1. the heartbeater was already computing the number of RUNNING : and BOOTSTRAPPING tablets by holding the appropriate lock and : iterating over the tablet map : 2. the alternative to having some thread periodically iterate : over the tablet map is to increment and decrement the metrics : when tablets transition states. This is error-prone, : particularly if new states are added, and mistakes will : accumulate until the metric is worse than useless. > I appreciate the justification, but I don't think the heartbeater is a grea I could use a function gauge, but a really fast metric poll could contend with other access to the tablet manager's tablet map: I would need n function gauges for n tablet states, which is n locks and iterations each time someone hits /metrics. That's why I liked the heartbeat approach despite the drawbacks...it just adapts the iteration over the tablet map that was already done for heartbeats (which already iterated k times for k masters). I don't think multiply-instantiating a metrics has bad effects. It doesn't create multiple metrics. Is the alternative you support to track the state transitions directly? I.e. in TransitionFromFooToBar() do num_foos->IncrementBy(-1); num_bars->Increment(); PS2, Line 23: A metric entity can now be : marked as hidden, so it will not print to /metrics, and : tablet metric entities are marked as hidden when the tablet is : tombstoned, and un-hidden if and when the tablet is revived. > Why hide them and not remove them outright? Don't "hidden" entities still a They take up some memory as part of the TabletReplica that remains in the tablet manager's map, and they need to be iterated over and skipped when metrics are gathered. I think that's small overhead. The benefit of making them hidden is that it's very simple. Actually removing an entity means 1. removing all references to it 2. removing all reference to its child metrics 3. Hitting /metrics after the retirement period has passed (default 2 minutes) I think only the Tablet actually holds a reference to its entity, so 1 is easy. 2 is harder, because a number of other objects hold references to tablet metrics, e.g. the transaction tracker and log. Some are destroyed when the TabletReplica is shut down, for example the Tablet itself, and some aren't, e.g. the log and the transaction tracker. But, if you do think it's worth it to remove and then re-init the metrics in the various places when a tablet is tombstoned and then revived, I can do that. -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Adar Dembo has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7618/2//COMMIT_MSG Commit Message: PS2, Line 11: The numbers are computed by the heartbeater. There's two reasons : for this: : 1. the heartbeater was already computing the number of RUNNING : and BOOTSTRAPPING tablets by holding the appropriate lock and : iterating over the tablet map : 2. the alternative to having some thread periodically iterate : over the tablet map is to increment and decrement the metrics : when tablets transition states. This is error-prone, : particularly if new states are added, and mistakes will : accumulate until the metric is worse than useless. I appreciate the justification, but I don't think the heartbeater is a great place either, since there's one heartbeating thread per master. So your current approach instantiates each metric multiple times (not to mention the unnecessary updates incurred from having more than one heartbeating thread). PS2, Line 23: A metric entity can now be : marked as hidden, so it will not print to /metrics, and : tablet metric entities are marked as hidden when the tablet is : tombstoned, and un-hidden if and when the tablet is revived. Why hide them and not remove them outright? Don't "hidden" entities still add ever-growing load given how we do tombstoning? http://gerrit.cloudera.org:8080/#/c/7618/2/src/kudu/util/metrics.h File src/kudu/util/metrics.h: Line 533: std::atomic hidden_; The rest of the file uses util/atomic.h for atomics; please do the same here. Also, are hidden/set_hidden accessed often? If not, could also make this a simple bool and protect with lock_. -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has posted comments on this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Patch Set 2: Second test failure was a legit TSAN failure (fixed). First and third failures appear to be flakes-- they don't look related to this patch: 1st was one I've seen be flaky, third was two tests, one failing due to unsynch'd clock and one due to a timeout in a Java token reauth test. -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7618 to look at the new patch set (#2). Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Add tablet state summary metrics and fix KUDU-2044 This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). The numbers are computed by the heartbeater. There's two reasons for this: 1. the heartbeater was already computing the number of RUNNING and BOOTSTRAPPING tablets by holding the appropriate lock and iterating over the tablet map 2. the alternative to having some thread periodically iterate over the tablet map is to increment and decrement the metrics when tablets transition states. This is error-prone, particularly if new states are added, and mistakes will accumulate until the metric is worse than useless. It also addresses KUDU-2044: tombstoned tablets will no longer produce metrics in /metrics output. A metric entity can now be marked as hidden, so it will not print to /metrics, and tablet metric entities are marked as hidden when the tablet is tombstoned, and un-hidden if and when the tablet is revived. Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 8 files changed, 250 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7618/2 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/7618 Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Add tablet state summary metrics and fix KUDU-2044 This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). The numbers are computed by the heartbeater. There's two reasons for this: 1. the heartbeater was already computing the number of RUNNING and BOOTSTRAPPING tablets by holding the appropriate lock and iterating over the tablet map 2. the alternative to having some thread periodically iterate over the tablet map is to increment and decrement the metrics when tablets transition states. This is error-prone, particularly if new states are added, and mistakes will accumulate until the metric is worse than useless. It also addresses KUDU-2044: tombstoned tablets will no longer produce metrics in /metrics output. A metric entity can now be marked as hidden, so it will not print to /metrics, and tablet metric entities are marked as hidden when the tablet is tombstoned, and un-hidden if and when the tablet is revived. Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 8 files changed, 249 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7618/1 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley