Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11981 )
Change subject: [raft_consensus-itest] fix flake in Test_KUDU_1735 ...................................................................... Patch Set 2: (9 comments) Thank you for the review! http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/cluster_itest_util.h@138 PS2, Line 138: consensus::OpIdType op_id_type = consensus::RECEIVED_OPID > nit: doc for this? Done http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2468 PS2, Line 2468: ASSERT_EQ(3, tablet_servers_.size()); > Not a question about this patch specifically, but what's the point of testi The point is that the 3-4-3 scheme have to work with 3 servers as well: there isn't a way currently to enforce users to have more than 3 tablet servers. Also, in case of 3 servers the 3-4-3 scheme successfully replaces replicas that fail with FAILED_UNRECOVERABLE state. http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2477 PS2, Line 2477: // Wait for at least one operation committed in current term by the leader. : // Otherwise, the RemoveServer() below would end up with IllegalState error: : // 'Leader has not yet committed an operation in its own term'. > nit: this is helpful insofar as it describes what was failing before, but I Done http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2500 PS2, Line 2500: affected_servers > nit: "affected" is pretty vague, so it's not as obvious as it could be in d Done http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2512 PS2, Line 2512: > nit: not the fault of this patch, but could you take this extra space out? Done http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2517 PS2, Line 2517: In case of the 3-4-3 scheme, the catalog manager adds the evicted replica : // back as a non-voter, and there might be two possible outcomes. : // Particularly, after the tablet's Raft configuration has been updated, : // the catalog manager sends DeleteTablet() request to the corresponding : // tablet server upon receiving updated tablet report from leader replica. > I think the ordering of these sentences is a bit awkward. I was expecting t Indeed -- it seems after multiple updates it became messy. I updated this piece as you suggested and added a bit more information which I think brings even more clarity. http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2522 PS2, Line 2522: 1. If the DeleteTablet() request succeeds before the leader replica : // receives ADD_PEER configuration follow-up update initiated by the : // catalog manager and re-discovers the evicted replica as a new peer : // with the LMP_MISMATCH status, the configuration update to add : // a new non-voter replica will stay pending since only one voter : // replica (the leader replica itself) is alive. > Is this LMP_MISMATCH information relevant to the outcome itself? Maybe this Right, the LMP_MISMATCH is not relevant in this case. The comment has been updated, thanks! http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2536 PS2, Line 2536: MonoDelta::FromSeconds(5) > Why not kTimeout? Because it's not necessary to wait for that long: the operations are being committed pretty soon, and in case of the 'outcome 2' there would be 5 extra seconds if using 10 seconds instead of 5 here. http://gerrit.cloudera.org:8080/#/c/11981/2/src/kudu/integration-tests/raft_consensus-itest.cc@2549 PS2, Line 2549: decltype(leader_tserver) srvs[] = > Is it possible to declare this as vector<decltype(leader_tserver)>? If so, Done -- To view, visit http://gerrit.cloudera.org:8080/11981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If44ad0e8363a3aead409484cff68843f1e5d6b6d Gerrit-Change-Number: 11981 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Comment-Date: Mon, 26 Nov 2018 05:02:34 +0000 Gerrit-HasComments: Yes