[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-18 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10076 )

Change subject: KUDU-2287 Expose election failures as a metric
..


Patch Set 5:

> Patch Set 5:
>
> Build Started http://jenkins.kudu.apache.org/job/kudu-gerrit/13010/

looking at the build, it's going to fail, but it doesn't seem related. someone 
pls rebuild it


--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 18 Apr 2018 17:01:44 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-18 Thread Attila Bukor (Code Review)
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10076

to look at the new patch set (#5).

Change subject: KUDU-2287 Expose election failures as a metric
..

KUDU-2287 Expose election failures as a metric

This patch exposes as a metric the number of election failures since the
last successful election attempt

Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 45 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/5
--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-18 Thread Attila Bukor (Code Review)
Hello Will Berkeley, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10076

to look at the new patch set (#4).

Change subject: KUDU-2287 Expose election failures as a metric
..

KUDU-2287 Expose election failures as a metric

This patch exposes as a metric the number of election failures since the
last successful election attempt

Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 45 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/4
--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-18 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10076 )

Change subject: KUDU-2287 Expose election failures as a metric
..


Patch Set 3:

(2 comments)

> Patch Set 2:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@7
PS1, Line 7: KUDU-2287 Expose election failures as a metric
> huh cool, didn't know about the function gauge. I'll look into this.
Done


http://gerrit.cloudera.org:8080/#/c/10076/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/10076/3/src/kudu/consensus/raft_consensus.cc@161
PS3, Line 161:   kudu::MetricUnit::kNanoseconds,
unit might be a bit overkill, let me know if I should change it to something 
else.



--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 18 Apr 2018 15:22:07 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-18 Thread Attila Bukor (Code Review)
Hello Will Berkeley, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10076

to look at the new patch set (#3).

Change subject: KUDU-2287 Expose election failures as a metric
..

KUDU-2287 Expose election failures as a metric

This patch exposes as a metric the number of election failures since the
last successful election attempt

Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 44 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/3
--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-17 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10076 )

Change subject: KUDU-2287 Expose election failures as a metric
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@7
PS1, Line 7: KUDU-2287 Expose election failures as a metric
> The JIRA suggests either this or an actual wall clock time since there was
I opted for the counter as it was more precise as we don't have "time since" 
metrics, so we could update the metric at each election failure, but not in 
between. Or is there a better approach to this that I didn't think of? I guess 
we could expose both numbers though.


http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@9
PS1, Line 9: This patch exposes as a metric the number of election failures 
since the
   : last su
> nit: Could you rewrite this as complete sentence:
Done


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@839
PS1, Line 839: The number of times this node has called and lost a leader 
election
> This is different than what the patch is claiming the metric measures. fail
yep, the description of the metric wasn't clear I guess, but I believe this is 
what this metric should keep track of. I agree that each node will have a 
different number here. These numbers should be summarized on the client side if 
necessary (e.g. in ksck).


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@855
PS1, Line 855: >>
> nit: While you're here, the space between the >'s isn't necessary in C++11.
fixed, wasn't sure if I should touch this line in this patch and didn't want to 
have two consecutive lines inconsistently formatted


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@156
PS1, Line 156: "Number of failed elections on this node since there was a s
> Not exactly correct according to my prev comment.
Done


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@157
PS1, Line 157: ion a
> resets
Done


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2290
PS1, Line 2290: ed_elections_metric_->set_value(faile
> Do you mean failed_elections_metric_? It should be non-null because of the
yep, I meant failed_elections_metric_, removed it if unnecessary (raft_term_ 
was used similarly)


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2453
PS1, Line 2453: f we called an election and one of th
> Same.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 17 Apr 2018 17:33:25 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-17 Thread Attila Bukor (Code Review)
Hello Will Berkeley, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10076

to look at the new patch set (#2).

Change subject: KUDU-2287 Expose election failures as a metric
..

KUDU-2287 Expose election failures as a metric

This patch exposes as a metric the number of election failures since the
last successful election attempt

Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 15 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/2
--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-16 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10076 )

Change subject: KUDU-2287 Expose election failures as a metric
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@7
PS1, Line 7: KUDU-2287 Expose election failures as a metric
The JIRA suggests either this or an actual wall clock time since there was a 
stable leader. I'm leaning slightly towards the time-based option vs a counter, 
since the time is more user-interpretable. eg if a replica hasn't had a leader 
for 60 seconds, I understand what that means, but if it hasn't had a leader for 
10 elections I'm not really sure how long that is, particularly given 
randomization and backoff.

Would it be too tricky to use a time based metric here? Or do you think there 
is some value to a count instead of time?



--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 16 Apr 2018 22:36:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-16 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10076 )

Change subject: KUDU-2287 Expose election failures as a metric
..


Patch Set 1:

(7 comments)

Needs a test as well.

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10076/1//COMMIT_MSG@9
PS1, Line 9: Exposing the number of election failures since there was a stable
   : leader.
nit: Could you rewrite this as complete sentence:

"This patch exposes as a metric the number of election failures since the last 
successful election attempt."


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@839
PS1, Line 839: The number of times this node has called and lost a leader 
election
This is different than what the patch is claiming the metric measures. 
failed_elections_since_stable_leader_ counts the number of times *this node* 
has called an election and not become leader since the last time it heard from 
a leader. Other nodes could have called elections and not won, too, and this 
metric doesn't count those.

As a practical matter, the randomization of election timeouts more-or-less 
guarantees this node will call elections if it isn't hearing from a leader, so 
the difference might not matter too much depending on the application of the 
metric.


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.h@855
PS1, Line 855: > >
nit: While you're here, the space between the >'s isn't necessary in C++11.


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@156
PS1, Line 156: "Number of failed elections since there was a stable leader.
Not exactly correct according to my prev comment.


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@157
PS1, Line 157: reset
resets


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2290
PS1, Line 2290: failed_elections_since_stable_leader_
Do you mean failed_elections_metric_? It should be non-null because of the 
FindOrCreate call, so I don't think the check here is necessary.


http://gerrit.cloudera.org:8080/#/c/10076/1/src/kudu/consensus/raft_consensus.cc@2453
PS1, Line 2453: failed_elections_since_stable_leader_
Same.



--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 16 Apr 2018 20:10:39 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2287 Expose election failures as a metric

2018-04-16 Thread Attila Bukor (Code Review)
Hello Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/10076

to review the following change.


Change subject: KUDU-2287 Expose election failures as a metric
..

KUDU-2287 Expose election failures as a metric

Exposing the number of election failures since there was a stable
leader.

Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 18 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/10076/1
--
To view, visit http://gerrit.cloudera.org:8080/10076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1b25df258cdba7bdae7bb2d7b4eb3d73b53425c3
Gerrit-Change-Number: 10076
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Todd Lipcon