[kudu-CR] make election timeout jitter more aggressive

2016-09-16 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged.

Change subject: make election timeout jitter more aggressive
..


make election timeout jitter more aggressive

Random election timeout jitter is necessary in Raft in order to
guarantee that an election can be won. If the jitter is smaller than RTT
or the accuracy of clocks, then elections could fail indefinitely. We
frequently hit an issue during tests where timeouts tend to 'clump'
together, causing elections to retry many times in a row, ultimately
leading to test timeout. This commit increases the jitter, so that
election timeout differences between nodes will hopefully be greater
than the clock error. This issue could also manifest if the RTT between
nodes is high.

Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Reviewed-on: http://gerrit.cloudera.org:8080/3828
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert 
---
M src/kudu/consensus/raft_consensus.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] make election timeout jitter more aggressive

2016-09-16 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] make election timeout jitter more aggressive

2016-09-16 Thread Dan Burkert (Code Review)
Hello Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3828

to look at the new patch set (#3).

Change subject: make election timeout jitter more aggressive
..

make election timeout jitter more aggressive

Random election timeout jitter is necessary in Raft in order to
guarantee that an election can be won. If the jitter is smaller than RTT
or the accuracy of clocks, then elections could fail indefinitely. We
frequently hit an issue during tests where timeouts tend to 'clump'
together, causing elections to retry many times in a row, ultimately
leading to test timeout. This commit increases the jitter, so that
election timeout differences between nodes will hopefully be greater
than the clock error. This issue could also manifest if the RTT between
nodes is high.

Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
---
M src/kudu/consensus/raft_consensus.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/3828/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] make election timeout jitter more aggressive

2016-09-16 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] make election timeout jitter more aggressive

2016-09-16 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

I've changed this so it's just making the jitter more aggressive.  The clamp is 
still a theoretical issue, but it would require an unrealistic 20s RTT to 
manifest.

-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] make election timeout jitter more aggressive

2016-09-16 Thread Dan Burkert (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3828

to look at the new patch set (#2).

Change subject: make election timeout jitter more aggressive
..

make election timeout jitter more aggressive

Random election timeout jitter is necessary in Raft in order to
guarantee that an election can be won. If the jitter is smaller than RTT
or the accuracy of clocks, then elections could fail indefinitely. We
frequently hit an issue during tests where timeouts tend to 'clump'
together, causing elections to retry many times in a row, ultimately
leading to test timeout. This commit increases the jitter, so that
election timeout differences between nodes will hopefully be greater
than the clock error. This issue could also manifest if the RTT between
nodes is high.

Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
---
M src/kudu/consensus/raft_consensus.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/3828/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] make election timeout jitter more aggressive

2016-08-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

Let's try and close this one out soon? Not sure where the conversation got left.

-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] make election timeout jitter more aggressive

2016-08-02 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

Thinking about this more, the 20s clamp is probably OK.  It means we could 
theoretically not make progress is RTT are > 20s, but we already can't make 
progress in that situation anyway since the election timeout is 1.5s.

-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] make election timeout jitter more aggressive

2016-08-02 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3828/1//COMMIT_MSG
Commit Message:

Line 7: make election timeout jitter more aggressive
> I agree with Todd, backoff cap avoids insane exponential backoffs. It seems
I don't think I'm following.  How can you add more jitter without increasing 
the average timeout?  We are, after al,l bounded by a minimum timeout of 0.  
The jitter *must* increase as a function of the number of retries, otherwise 
you risk a situation where the cluster can't make progress due to RTT being 
greater than the jitter.


-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] make election timeout jitter more aggressive

2016-08-02 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3828/1//COMMIT_MSG
Commit Message:

Line 7: make election timeout jitter more aggressive
> yea, but the jitter is only aggressive due to the backoff being more aggres
I agree with Todd, backoff cap avoids insane exponential backoffs. It seems 
like the jitter is what we are really worried about here. And TBH I'm not 
convinced this is the problem. Although I'd support wider jitter variance.

On a sort of side note, I tried to add a generic exponential backoff helper a 
long time ago in https://gerrit.cloudera.org/#/c/979/ ... maybe we should 
partially resurrect that patch and work on ensuring that we have a single 
exponential backoff function that is parameterized and flexible enough to 
handle all situations.


-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] make election timeout jitter more aggressive

2016-08-02 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3828/1//COMMIT_MSG
Commit Message:

Line 7: make election timeout jitter more aggressive
> The lower bound timeout isn't changed, only the upper bound.  So the range 
yea, but the jitter is only aggressive due to the backoff being more aggressive


-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] make election timeout jitter more aggressive

2016-08-02 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3828/1//COMMIT_MSG
Commit Message:

Line 7: make election timeout jitter more aggressive
> isn't it making the backoff more aggressive rather than making the jitter m
The lower bound timeout isn't changed, only the upper bound.  So the range of 
backoff times is greatly increased.  For instance, the previous algorithm had 
spreads of (.15s, 0.315s, 0.4965s, 0.696s) after 0, 1, 2, 3 failed elections, 
respectively.  The new spreads are (0.75s, 1.875s, 3.56s, 6.09s).  The actual 
timeout is the base (1.5s) plus a random value between 0 and the spread.


-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] make election timeout jitter more aggressive

2016-08-02 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3828/1//COMMIT_MSG
Commit Message:

Line 7: make election timeout jitter more aggressive
isn't it making the backoff more aggressive rather than making the jitter more 
aggressive? ie the biggest change is going from 1.1 base to 1.5?


-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] make election timeout jitter more aggressive

2016-08-01 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: make election timeout jitter more aggressive
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2697/

-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] make election timeout jitter more aggressive

2016-08-01 Thread Dan Burkert (Code Review)
Hello Adar Dembo, Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/3828

to review the following change.

Change subject: make election timeout jitter more aggressive
..

make election timeout jitter more aggressive

Existing election timeouts have very low variance, and is capped at a maximum
value. Having a variance cap is problematic because it could cause Raft to not
make progress when the RTT between nodes is greater than the cap.
Counter-intuitively, having a low variance in timeouts causes elections to take
longer since it leads to more frequent election retries. This commit removes the
cap and increases the variance.

Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
---
M src/kudu/consensus/raft_consensus.cc
1 file changed, 2 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/3828/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2c9dad820c2b7d4bc4b9e791b78222559cdf63c8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Todd Lipcon