Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10076 )
Change subject: KUDU-2287 Expose election failures as metrics ...................................................................... Patch Set 15: (1 comment) Also needs some tests, maybe in raft-consensus-itest, and possibly partially as a new piece of existing tests. Probably also needs some tests specifically for pre-election v election behavior. http://gerrit.cloudera.org:8080/#/c/10076/15/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/10076/15/src/kudu/consensus/raft_consensus.cc@439 PS15, Line 439: time_since_leader_lost_ = MonoTime::Now(); If a node loses contact with the leader for 10 seconds, say, and calls a pre-election which it loses, then this code will set time_since_leader_lost_. The node will then re-establish contact with the leader, with no term change and no call to SetLeaderUuidUnlocked. See DoElectionCallback. The metric will then erroneously imply the node doesn't see a leader until the tablet changes leadership. Pre-elections are tricky to handle. We could have a node that is partitioned from the leader, but not from the other followers, and it will constantly call pre-elections and lose. In this case we'd like the time_since_leader_lost metric to keep increasing, not reset on every lost pre-election. But, if there's never a real election and the node re-establishes contact with the leader then we want the metric to reset. So I think maybe we should be tracking this metric through Update() (the heartbeat on the follower). -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 15 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 16 May 2018 15:43:13 +0000 Gerrit-HasComments: Yes
