[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-30 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3819/2//COMMIT_MSG
Commit Message:

Line 47: 
> Also a nit: You may want to link this commit to KUDU-1548 I triaged where I
Notice that the rest of the commit message is wrapped at 80 chars.

I'm not going to try to condense this part to 80 chars -- it will be totally 
unreadable. Generally speaking, it's not good style to go over 80 chars, but I 
think one reasonable exception is pasting in log messages like this one. It's 
at the bottom of the commit message, so by the time you get to this part in 
"git log" you've already read the important part.

Regarding KUDU-1548 thanks for filing it. It looks like that JIRA potentially 
has a bunch of different issues listed. It might be better to have separate 
JIRAs about each issue though. I'll add a link to it in the commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/3819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..


Patch Set 2:

(1 comment)

> RaftConsensusITest(line 594)

That's fine without waiting because the client will automatically retry until 
the election is done.

> RemoteBootStrapITest and DeleteTabletTest

Yeah, same as above. This case was problematic because of the nature of the 
test, which is fairly unique.

http://gerrit.cloudera.org:8080/#/c/3819/2/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

Line 141:   RETURN_NOT_OK(GetLastOpIdForEachReplica(tablet_id, { replica }, 
opid_type, timeout, &op_ids));
> Was this styling issue or any impact in the way vector is passed in ?
Just a style issue (see commit message) ... less lines == easier to read.


-- 
To view, visit http://gerrit.cloudera.org:8080/3819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..


Patch Set 2: -Code-Review

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3819/2//COMMIT_MSG
Commit Message:

Line 47: 
Also a nit: You may want to link this commit to KUDU-1548 I triaged where I 
have listed the error logs, or you can try to condense this to 80 columns 
somehow ? Latter would be ugly I guess.


-- 
To view, visit http://gerrit.cloudera.org:8080/3819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..


Patch Set 2: Code-Review+1

(1 comment)

Thanks for quickly patching this Mike. Not your change, but can you confirm if 
this StartElection doesn't need to wait at few other places ? 
RaftConsensusITest(line 594), RemoteBootStrapITest and DleteTabletTest as there 
are few tests which aren't waiting for leader election before pumping the 
workload.

http://gerrit.cloudera.org:8080/#/c/3819/2/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

Line 141:   RETURN_NOT_OK(GetLastOpIdForEachReplica(tablet_id, { replica }, 
opid_type, timeout, &op_ids));
Was this styling issue or any impact in the way vector is passed in ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/2686/

-- 
To view, visit http://gerrit.cloudera.org:8080/3819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3819

to look at the new patch set (#2).

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..

Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

This test was occasionally failing its 2nd election in TSAN mode because
we were resuming the previous leader before the new leader could be
elected. Sometimes the previous leader was fast enough to replicate its
pending config change to a majority of nodes before the new candidate
could send out its election RPC, thus violating the underlying
assumptions of this test.

I also added a minor C++11 syntax-only change in the cluster itest utils
class as a part of this patch (doesn't change any behavior).

Before this fix, this test failed 15/800 times on the dist-test cluster.
After this change, it passed 100% of the time.

The test log looked something like this:

I0729 18:59:47.834403 11544 raft_consensus.cc:370] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 
FOLLOWER]: No leader contacted us within the election timeout. Triggering 
leader election
I0729 18:59:47.834686 11544 raft_consensus.cc:2019] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 
FOLLOWER]: Advancing to term 2
I0729 18:59:47.840427 11544 leader_election.cc:223] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
[CANDIDATE]: Term 2 election: Requesting vote from peer 
54197053abab4b6cb1b1632c9d1062dc
I0729 18:59:47.840860 11544 leader_election.cc:223] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
[CANDIDATE]: Term 2 election: Requesting vote from peer 
3522a8de8170476dba0beb58cb2150d4
I0729 18:59:47.872720 11669 raft_consensus.cc:869] T 
e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2150d4 [term 1 
FOLLOWER]: Refusing update from remote peer 54197053abab4b6cb1b1632c9d1062dc: 
Log matching property violated. Preceding OpId in replica: term: 1 index: 1. 
Preceding OpId from leader: term: 1 index: 2. (index mismatch)
I0729 18:59:47.874522 11454 consensus_queue.cc:578] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [LEADER]: 
Connected to new peer: Peer: 3522a8de8170476dba0beb58cb2150d4, Is new: false, 
Last received: 1.1, Next index: 2, Last known committed idx: 1, Last exchange 
result: ERROR, Needs remote bootstrap: false
I0729 18:59:47.878105 11150 raft_consensus.cc:1324] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
LEADER]: Handling vote request from an unknown peer 
d4f64819170a4cf78fe4c9e9a72ec4b9
I0729 18:59:47.878290 11150 raft_consensus.cc:2014] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
LEADER]: Stepping down as leader of term 1
I0729 18:59:47.878451 11150 raft_consensus.cc:499] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
LEADER]: Becoming Follower/Learner. State: Replica: 
54197053abab4b6cb1b1632c9d1062dc, State: 1, Role: LEADER
Watermarks: {Received: term: 1 index: 2 Committed: term: 1 index: 1}
I0729 18:59:47.878968 11150 consensus_queue.cc:162] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc 
[NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, 
Majority replicated op: 1.1, Committed index: 1.1, Last appended: 1.2, Current 
term: 1, Majority size: -1, State: 1, Mode: NON_LEADER
I0729 18:59:47.879871 11150 consensus_peers.cc:358] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc -> Peer 
3522a8de8170476dba0beb58cb2150d4 (127.37.56.2:53243): Closing peer: 
3522a8de8170476dba0beb58cb2150d4
I0729 18:59:47.882057 11150 raft_consensus.cc:2019] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
FOLLOWER]: Advancing to term 2
I0729 18:59:47.885711 11150 raft_consensus.cc:1626] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 
FOLLOWER]: Leader election vote request: Denying vote to candidate 
d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged 
OpId of term: 1 index: 2, which is greater than that of the candidate, which 
has last-logged OpId of term: 1 index: 1.
I0729 18:59:47.892060 11477 leader_election.cc:361] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
[CANDIDATE]: Term 2 election: Vote denied by peer 
54197053abab4b6cb1b1632c9d1062dc. Message: Invalid argument: T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 
FOLLOWER]: Leader election vote request: Denying vote to candidate 
d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged 
OpId of term: 1 index: 2, which is greater than that of the candidate, which 
has last-logged OpId of term: 1 index: 1.
I0729 18:59:47.894548 11669 raft_c

[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Mike Percy (Code Review)
Hello Dinesh Bhat,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/3819

to review the following change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..

Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

This test was occasionally failing its 2nd election in TSAN mode because
we were resuming the previous leader before the new leader could be
elected. Sometimes the previous leader was fast enough to replicate its
pending config change to a majority of nodes before the new candidate
could send out its election RPC, thus violating the underlying
assumptions of this test.

I also added a minor C++11 syntax-only change in the cluster itest utils
class as a part of this patch (doesn't change any behavior).

Before this fix, this test failed 15/800 times on the dist-test cluster.
After this change, it passed 100% of the time.

The test log looked something like this:

I0729 18:59:47.834403 11544 raft_consensus.cc:370] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 
FOLLOWER]: No leader contacted us within the election timeout. Triggering 
leader election
I0729 18:59:47.834686 11544 raft_consensus.cc:2019] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 
FOLLOWER]: Advancing to term 2
I0729 18:59:47.840427 11544 leader_election.cc:223] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
[CANDIDATE]: Term 2 election: Requesting vote from peer 
54197053abab4b6cb1b1632c9d1062dc
I0729 18:59:47.840860 11544 leader_election.cc:223] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
[CANDIDATE]: Term 2 election: Requesting vote from peer 
3522a8de8170476dba0beb58cb2150d4
I0729 18:59:47.872720 11669 raft_consensus.cc:869] T 
e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2150d4 [term 1 
FOLLOWER]: Refusing update from remote peer 54197053abab4b6cb1b1632c9d1062dc: 
Log matching property violated. Preceding OpId in replica: term: 1 index: 1. 
Preceding OpId from leader: term: 1 index: 2. (index mismatch)
I0729 18:59:47.874522 11454 consensus_queue.cc:578] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [LEADER]: 
Connected to new peer: Peer: 3522a8de8170476dba0beb58cb2150d4, Is new: false, 
Last received: 1.1, Next index: 2, Last known committed idx: 1, Last exchange 
result: ERROR, Needs remote bootstrap: false
I0729 18:59:47.878105 11150 raft_consensus.cc:1324] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
LEADER]: Handling vote request from an unknown peer 
d4f64819170a4cf78fe4c9e9a72ec4b9
I0729 18:59:47.878290 11150 raft_consensus.cc:2014] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
LEADER]: Stepping down as leader of term 1
I0729 18:59:47.878451 11150 raft_consensus.cc:499] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
LEADER]: Becoming Follower/Learner. State: Replica: 
54197053abab4b6cb1b1632c9d1062dc, State: 1, Role: LEADER
Watermarks: {Received: term: 1 index: 2 Committed: term: 1 index: 1}
I0729 18:59:47.878968 11150 consensus_queue.cc:162] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc 
[NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, 
Majority replicated op: 1.1, Committed index: 1.1, Last appended: 1.2, Current 
term: 1, Majority size: -1, State: 1, Mode: NON_LEADER
I0729 18:59:47.879871 11150 consensus_peers.cc:358] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc -> Peer 
3522a8de8170476dba0beb58cb2150d4 (127.37.56.2:53243): Closing peer: 
3522a8de8170476dba0beb58cb2150d4
I0729 18:59:47.882057 11150 raft_consensus.cc:2019] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
FOLLOWER]: Advancing to term 2
I0729 18:59:47.885711 11150 raft_consensus.cc:1626] T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 
FOLLOWER]: Leader election vote request: Denying vote to candidate 
d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged 
OpId of term: 1 index: 2, which is greater than that of the candidate, which 
has last-logged OpId of term: 1 index: 1.
I0729 18:59:47.892060 11477 leader_election.cc:361] T 
e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
[CANDIDATE]: Term 2 election: Vote denied by peer 
54197053abab4b6cb1b1632c9d1062dc. Message: Invalid argument: T 
e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 
FOLLOWER]: Leader election vote request: Denying vote to candidate 
d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged 
OpId of term: 1 index: 2, which is greater than that of the candidate, which 
has last-logged OpId of term: 1 index: 1.
I0729 18:59:47.894548 11669 raft_consens

[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation

2016-07-29 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Fix flakiness in 
RaftConsensusITest.TestReplaceChangeConfigOperation
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2685/

-- 
To view, visit http://gerrit.cloudera.org:8080/3819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No