Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10588 )
Change subject: [tools] KUDU-2461 Add election metrics to ksck ...................................................................... Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG@24 PS4, Line 24: Config source | Replicas | Current term | Config index | Failed elections | Millis since heartbeat | Committed? > didn't look at the patch yet, but this is starting to get really wide. Mayb I agree. It's great to have so much useful info but let's only show the election stuff when it's noteworthy, and in that case I think it's worth adding extra callouts below the consensus matrix. e.g. something like Config source | Replicas ---------------+-------------- master | A B* C A | A B* C B | A B* C C | A B C WARNING: Replica C has not heard from the leader in 7483ms WARNING: Replica C has called 2 failed election(s) since it last heard from a leader. Maybe they should be combine in one message as well, since it's kind of redundant to say it hasn't heard from a leader and it's called elections, one a few seconds have passed. http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto File src/kudu/consensus/metadata.proto: http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto@161 PS4, Line 161: // Number of failed elections and pre-elections since the last successful election : optional int64 failed_elections = 5; : : // Time in milliseconds since the last leader heartbeat : optional int64 millis_since_heartbeat = 6; I don't think I feel good about shoving in these metrics with the cstate. This PB is used a lot of other places e.g. TS -> master heartbeats and in consensus itself, and it contains just core consensus info right now. I don't think it's wise to set a precedent of putting things in it just to communicate them to ksck. While it expands the scope of the patch, I think instead we should introduce a basic mechanism for ksck to grab metrics from the cluster. AFAIK the way to do this is through the /metrics http endpoint but I think I'd rather have an RPC, since it helps fix what's on the wire and it avoids making parts of ksck contingent on the web interface being available. -- To view, visit http://gerrit.cloudera.org:8080/10588 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745 Gerrit-Change-Number: 10588 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Thu, 14 Jun 2018 17:49:52 +0000 Gerrit-HasComments: Yes
