Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10588 )

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG@24
PS4, Line 24:  Config source |        Replicas        | Current term | Config 
index | Failed elections | Millis since heartbeat | Committed?
> didn't look at the patch yet, but this is starting to get really wide. Mayb
I agree. It's great to have so much useful info but let's only show the 
election stuff when it's noteworthy, and in that case I think it's worth adding 
extra callouts below the consensus matrix. e.g. something like

     Config source |   Replicas
    ---------------+--------------
     master        | A   B*  C
     A             | A   B*  C
     B             | A   B*  C
     C             | A   B  C
     WARNING: Replica C has not heard from the leader in 7483ms
     WARNING: Replica C has called 2 failed election(s) since it last heard 
from a leader.

Maybe they should be combine in one message as well, since it's kind of 
redundant to say it hasn't heard from a leader and it's called elections, one a 
few seconds have passed.


http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto@161
PS4, Line 161: // Number of failed elections and pre-elections since the last 
successful election
             :   optional int64 failed_elections = 5;
             :
             :   // Time in milliseconds since the last leader heartbeat
             :   optional int64 millis_since_heartbeat = 6;
I don't think I feel good about shoving in these metrics with the cstate. This 
PB is used a lot of other places e.g. TS -> master heartbeats and in consensus 
itself, and it contains just core consensus info right now. I don't think it's 
wise to set a precedent of putting things in it just to communicate them to 
ksck.

While it expands the scope of the patch, I think instead we should introduce a 
basic mechanism for ksck to grab metrics from the cluster. AFAIK the way to do 
this is through the /metrics http endpoint but I think I'd rather have an RPC, 
since it helps fix what's on the wire and it avoids making parts of ksck 
contingent on the web interface being available.



--
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Thu, 14 Jun 2018 17:49:52 +0000
Gerrit-HasComments: Yes

Reply via email to