[kudu-CR] KUDU-2287 Expose election failures as a metric
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10076 ) Change subject: KUDU-2287 Expose election failures as a metric .. Patch Set 5: > Patch Set 5: > > Build Started http://jenkins.kudu.apache.org/job/kudu-gerrit/13010/ looking at the build, it's going to fail, but it doesn't seem related. someone pls rebuild it -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 5 Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 18 Apr 2018 17:01:44 + Gerrit-HasComments: No
[kudu-CR] KUDU-2287 Expose election failures as a metric
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10076 to look at the new patch set (#5). Change subject: KUDU-2287 Expose election failures as a metric .. KUDU-2287 Expose election failures as a metric This patch exposes as a metric the number of election failures since the last successful election attempt Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 45 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/5 -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 5 Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2287 Expose election failures as a metric
Hello Will Berkeley, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10076 to look at the new patch set (#4). Change subject: KUDU-2287 Expose election failures as a metric .. KUDU-2287 Expose election failures as a metric This patch exposes as a metric the number of election failures since the last successful election attempt Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 45 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/4 -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 4 Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2287 Expose election failures as a metric
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10076 ) Change subject: KUDU-2287 Expose election failures as a metric .. Patch Set 3: (2 comments) > Patch Set 2: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@7 PS1, Line 7: KUDU-2287 Expose election failures as a metric > huh cool, didn't know about the function gauge. I'll look into this. Done http://gerrit.cloudera.org:8080/#/c/10076/3/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/10076/3/src/kudu/consensus/raft_consensus.cc@161 PS3, Line 161: kudu::MetricUnit::kNanoseconds, unit might be a bit overkill, let me know if I should change it to something else. -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 3 Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 18 Apr 2018 15:22:07 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2287 Expose election failures as a metric
Hello Will Berkeley, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10076 to look at the new patch set (#3). Change subject: KUDU-2287 Expose election failures as a metric .. KUDU-2287 Expose election failures as a metric This patch exposes as a metric the number of election failures since the last successful election attempt Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 44 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/3 -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 3 Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2287 Expose election failures as a metric
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10076 ) Change subject: KUDU-2287 Expose election failures as a metric .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@7 PS1, Line 7: KUDU-2287 Expose election failures as a metric > The JIRA suggests either this or an actual wall clock time since there was I opted for the counter as it was more precise as we don't have "time since" metrics, so we could update the metric at each election failure, but not in between. Or is there a better approach to this that I didn't think of? I guess we could expose both numbers though. http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@9 PS1, Line 9: This patch exposes as a metric the number of election failures since the : last su > nit: Could you rewrite this as complete sentence: Done http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@839 PS1, Line 839: The number of times this node has called and lost a leader election > This is different than what the patch is claiming the metric measures. fail yep, the description of the metric wasn't clear I guess, but I believe this is what this metric should keep track of. I agree that each node will have a different number here. These numbers should be summarized on the client side if necessary (e.g. in ksck). http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@855 PS1, Line 855: >> > nit: While you're here, the space between the >'s isn't necessary in C++11. fixed, wasn't sure if I should touch this line in this patch and didn't want to have two consecutive lines inconsistently formatted http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@156 PS1, Line 156: "Number of failed elections on this node since there was a s > Not exactly correct according to my prev comment. Done http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@157 PS1, Line 157: ion a > resets Done http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2290 PS1, Line 2290: ed_elections_metric_->set_value(faile > Do you mean failed_elections_metric_? It should be non-null because of the yep, I meant failed_elections_metric_, removed it if unnecessary (raft_term_ was used similarly) http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2453 PS1, Line 2453: f we called an election and one of th > Same. Done -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 2 Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 17 Apr 2018 17:33:25 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2287 Expose election failures as a metric
Hello Will Berkeley, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10076 to look at the new patch set (#2). Change subject: KUDU-2287 Expose election failures as a metric .. KUDU-2287 Expose election failures as a metric This patch exposes as a metric the number of election failures since the last successful election attempt Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 15 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/2 -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 2 Gerrit-Owner: Attila BukorGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2287 Expose election failures as a metric
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10076 ) Change subject: KUDU-2287 Expose election failures as a metric .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@7 PS1, Line 7: KUDU-2287 Expose election failures as a metric The JIRA suggests either this or an actual wall clock time since there was a stable leader. I'm leaning slightly towards the time-based option vs a counter, since the time is more user-interpretable. eg if a replica hasn't had a leader for 60 seconds, I understand what that means, but if it hasn't had a leader for 10 elections I'm not really sure how long that is, particularly given randomization and backoff. Would it be too tricky to use a time based metric here? Or do you think there is some value to a count instead of time? -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 1 Gerrit-Owner: Attila BukorGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 16 Apr 2018 22:36:45 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2287 Expose election failures as a metric
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10076 ) Change subject: KUDU-2287 Expose election failures as a metric .. Patch Set 1: (7 comments) Needs a test as well. http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@9 PS1, Line 9: Exposing the number of election failures since there was a stable : leader. nit: Could you rewrite this as complete sentence: "This patch exposes as a metric the number of election failures since the last successful election attempt." http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@839 PS1, Line 839: The number of times this node has called and lost a leader election This is different than what the patch is claiming the metric measures. failed_elections_since_stable_leader_ counts the number of times *this node* has called an election and not become leader since the last time it heard from a leader. Other nodes could have called elections and not won, too, and this metric doesn't count those. As a practical matter, the randomization of election timeouts more-or-less guarantees this node will call elections if it isn't hearing from a leader, so the difference might not matter too much depending on the application of the metric. http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@855 PS1, Line 855: > > nit: While you're here, the space between the >'s isn't necessary in C++11. http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@156 PS1, Line 156: "Number of failed elections since there was a stable leader. Not exactly correct according to my prev comment. http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@157 PS1, Line 157: reset resets http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2290 PS1, Line 2290: failed_elections_since_stable_leader_ Do you mean failed_elections_metric_? It should be non-null because of the FindOrCreate call, so I don't think the check here is necessary. http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2453 PS1, Line 2453: failed_elections_since_stable_leader_ Same. -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 1 Gerrit-Owner: Attila BukorGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 16 Apr 2018 20:10:39 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2287 Expose election failures as a metric
Hello Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/10076 to review the following change. Change subject: KUDU-2287 Expose election failures as a metric .. KUDU-2287 Expose election failures as a metric Exposing the number of election failures since there was a stable leader. Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/1 -- To view, visit http://gerrit.cloudera.org:8080/10076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3 Gerrit-Change-Number: 10076 Gerrit-PatchSet: 1 Gerrit-Owner: Attila BukorGerrit-Reviewer: Todd Lipcon