[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Mike Percy has posted comments on this change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3819/2//COMMIT_MSG Commit Message: Line 47: > Also a nit: You may want to link this commit to KUDU-1548 I triaged where I Notice that the rest of the commit message is wrapped at 80 chars. I'm not going to try to condense this part to 80 chars -- it will be totally unreadable. Generally speaking, it's not good style to go over 80 chars, but I think one reasonable exception is pasting in log messages like this one. It's at the bottom of the commit message, so by the time you get to this part in "git log" you've already read the important part. Regarding KUDU-1548 thanks for filing it. It looks like that JIRA potentially has a bunch of different issues listed. It might be better to have separate JIRAs about each issue though. I'll add a link to it in the commit message. -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Mike Percy has posted comments on this change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 2: (1 comment) > RaftConsensusITest(line 594) That's fine without waiting because the client will automatically retry until the election is done. > RemoteBootStrapITest and DeleteTabletTest Yeah, same as above. This case was problematic because of the nature of the test, which is fairly unique. http://gerrit.cloudera.org:8080/#/c/3819/2/src/kudu/integration-tests/cluster_itest_util.cc File src/kudu/integration-tests/cluster_itest_util.cc: Line 141: RETURN_NOT_OK(GetLastOpIdForEachReplica(tablet_id, { replica }, opid_type, timeout, &op_ids)); > Was this styling issue or any impact in the way vector is passed in ? Just a style issue (see commit message) ... less lines == easier to read. -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Dinesh Bhat has posted comments on this change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 2: -Code-Review (1 comment) http://gerrit.cloudera.org:8080/#/c/3819/2//COMMIT_MSG Commit Message: Line 47: Also a nit: You may want to link this commit to KUDU-1548 I triaged where I have listed the error logs, or you can try to condense this to 80 columns somehow ? Latter would be ugly I guess. -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Dinesh Bhat has posted comments on this change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 2: Code-Review+1 (1 comment) Thanks for quickly patching this Mike. Not your change, but can you confirm if this StartElection doesn't need to wait at few other places ? RaftConsensusITest(line 594), RemoteBootStrapITest and DleteTabletTest as there are few tests which aren't waiting for leader election before pumping the workload. http://gerrit.cloudera.org:8080/#/c/3819/2/src/kudu/integration-tests/cluster_itest_util.cc File src/kudu/integration-tests/cluster_itest_util.cc: Line 141: RETURN_NOT_OK(GetLastOpIdForEachReplica(tablet_id, { replica }, opid_type, timeout, &op_ids)); Was this styling issue or any impact in the way vector is passed in ? -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Kudu Jenkins has posted comments on this change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2686/ -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3819 to look at the new patch set (#2). Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation This test was occasionally failing its 2nd election in TSAN mode because we were resuming the previous leader before the new leader could be elected. Sometimes the previous leader was fast enough to replicate its pending config change to a majority of nodes before the new candidate could send out its election RPC, thus violating the underlying assumptions of this test. I also added a minor C++11 syntax-only change in the cluster itest utils class as a part of this patch (doesn't change any behavior). Before this fix, this test failed 15/800 times on the dist-test cluster. After this change, it passed 100% of the time. The test log looked something like this: I0729 18:59:47.834403 11544 raft_consensus.cc:370] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 FOLLOWER]: No leader contacted us within the election timeout. Triggering leader election I0729 18:59:47.834686 11544 raft_consensus.cc:2019] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 FOLLOWER]: Advancing to term 2 I0729 18:59:47.840427 11544 leader_election.cc:223] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Requesting vote from peer 54197053abab4b6cb1b1632c9d1062dc I0729 18:59:47.840860 11544 leader_election.cc:223] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Requesting vote from peer 3522a8de8170476dba0beb58cb2150d4 I0729 18:59:47.872720 11669 raft_consensus.cc:869] T e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2150d4 [term 1 FOLLOWER]: Refusing update from remote peer 54197053abab4b6cb1b1632c9d1062dc: Log matching property violated. Preceding OpId in replica: term: 1 index: 1. Preceding OpId from leader: term: 1 index: 2. (index mismatch) I0729 18:59:47.874522 11454 consensus_queue.cc:578] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [LEADER]: Connected to new peer: Peer: 3522a8de8170476dba0beb58cb2150d4, Is new: false, Last received: 1.1, Next index: 2, Last known committed idx: 1, Last exchange result: ERROR, Needs remote bootstrap: false I0729 18:59:47.878105 11150 raft_consensus.cc:1324] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Handling vote request from an unknown peer d4f64819170a4cf78fe4c9e9a72ec4b9 I0729 18:59:47.878290 11150 raft_consensus.cc:2014] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Stepping down as leader of term 1 I0729 18:59:47.878451 11150 raft_consensus.cc:499] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Becoming Follower/Learner. State: Replica: 54197053abab4b6cb1b1632c9d1062dc, State: 1, Role: LEADER Watermarks: {Received: term: 1 index: 2 Committed: term: 1 index: 1} I0729 18:59:47.878968 11150 consensus_queue.cc:162] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, Majority replicated op: 1.1, Committed index: 1.1, Last appended: 1.2, Current term: 1, Majority size: -1, State: 1, Mode: NON_LEADER I0729 18:59:47.879871 11150 consensus_peers.cc:358] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc -> Peer 3522a8de8170476dba0beb58cb2150d4 (127.37.56.2:53243): Closing peer: 3522a8de8170476dba0beb58cb2150d4 I0729 18:59:47.882057 11150 raft_consensus.cc:2019] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 FOLLOWER]: Advancing to term 2 I0729 18:59:47.885711 11150 raft_consensus.cc:1626] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 FOLLOWER]: Leader election vote request: Denying vote to candidate d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged OpId of term: 1 index: 2, which is greater than that of the candidate, which has last-logged OpId of term: 1 index: 1. I0729 18:59:47.892060 11477 leader_election.cc:361] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Vote denied by peer 54197053abab4b6cb1b1632c9d1062dc. Message: Invalid argument: T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 FOLLOWER]: Leader election vote request: Denying vote to candidate d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged OpId of term: 1 index: 2, which is greater than that of the candidate, which has last-logged OpId of term: 1 index: 1. I0729 18:59:47.894548 11669 raft_c
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Hello Dinesh Bhat, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3819 to review the following change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation This test was occasionally failing its 2nd election in TSAN mode because we were resuming the previous leader before the new leader could be elected. Sometimes the previous leader was fast enough to replicate its pending config change to a majority of nodes before the new candidate could send out its election RPC, thus violating the underlying assumptions of this test. I also added a minor C++11 syntax-only change in the cluster itest utils class as a part of this patch (doesn't change any behavior). Before this fix, this test failed 15/800 times on the dist-test cluster. After this change, it passed 100% of the time. The test log looked something like this: I0729 18:59:47.834403 11544 raft_consensus.cc:370] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 FOLLOWER]: No leader contacted us within the election timeout. Triggering leader election I0729 18:59:47.834686 11544 raft_consensus.cc:2019] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 FOLLOWER]: Advancing to term 2 I0729 18:59:47.840427 11544 leader_election.cc:223] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Requesting vote from peer 54197053abab4b6cb1b1632c9d1062dc I0729 18:59:47.840860 11544 leader_election.cc:223] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Requesting vote from peer 3522a8de8170476dba0beb58cb2150d4 I0729 18:59:47.872720 11669 raft_consensus.cc:869] T e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2150d4 [term 1 FOLLOWER]: Refusing update from remote peer 54197053abab4b6cb1b1632c9d1062dc: Log matching property violated. Preceding OpId in replica: term: 1 index: 1. Preceding OpId from leader: term: 1 index: 2. (index mismatch) I0729 18:59:47.874522 11454 consensus_queue.cc:578] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [LEADER]: Connected to new peer: Peer: 3522a8de8170476dba0beb58cb2150d4, Is new: false, Last received: 1.1, Next index: 2, Last known committed idx: 1, Last exchange result: ERROR, Needs remote bootstrap: false I0729 18:59:47.878105 11150 raft_consensus.cc:1324] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Handling vote request from an unknown peer d4f64819170a4cf78fe4c9e9a72ec4b9 I0729 18:59:47.878290 11150 raft_consensus.cc:2014] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Stepping down as leader of term 1 I0729 18:59:47.878451 11150 raft_consensus.cc:499] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 LEADER]: Becoming Follower/Learner. State: Replica: 54197053abab4b6cb1b1632c9d1062dc, State: 1, Role: LEADER Watermarks: {Received: term: 1 index: 2 Committed: term: 1 index: 1} I0729 18:59:47.878968 11150 consensus_queue.cc:162] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, Majority replicated op: 1.1, Committed index: 1.1, Last appended: 1.2, Current term: 1, Majority size: -1, State: 1, Mode: NON_LEADER I0729 18:59:47.879871 11150 consensus_peers.cc:358] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc -> Peer 3522a8de8170476dba0beb58cb2150d4 (127.37.56.2:53243): Closing peer: 3522a8de8170476dba0beb58cb2150d4 I0729 18:59:47.882057 11150 raft_consensus.cc:2019] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 FOLLOWER]: Advancing to term 2 I0729 18:59:47.885711 11150 raft_consensus.cc:1626] T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 FOLLOWER]: Leader election vote request: Denying vote to candidate d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged OpId of term: 1 index: 2, which is greater than that of the candidate, which has last-logged OpId of term: 1 index: 1. I0729 18:59:47.892060 11477 leader_election.cc:361] T e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [CANDIDATE]: Term 2 election: Vote denied by peer 54197053abab4b6cb1b1632c9d1062dc. Message: Invalid argument: T e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 FOLLOWER]: Leader election vote request: Denying vote to candidate d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged OpId of term: 1 index: 2, which is greater than that of the candidate, which has last-logged OpId of term: 1 index: 1. I0729 18:59:47.894548 11669 raft_consens
[kudu-CR] Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation
Kudu Jenkins has posted comments on this change. Change subject: Fix flakiness in RaftConsensusITest.TestReplaceChangeConfigOperation .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/2685/ -- To view, visit http://gerrit.cloudera.org:8080/3819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib91b5cc974656e82f670d6a938f537b63338d036 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No