[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Persist ConsensusMetadata before returning from TabletCopyClient::Start() because we need cmeta to Init() RaftConsensus, which happens when registering the replica in TSTabletManager. * TSTabletManager::DeleteTablet() should not consider FAILED == deleted, since we no longer destroy RaftConsensus when tombstoning a replica. * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Reviewed-on: http://gerrit.cloudera.org:8080/6960 Tested-by: Kudu Jenkins Reviewed-by: Alexey SerbinReviewed-by: Todd Lipcon --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 27 files changed, 1,439 insertions(+), 296 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Alexey Serbin: Looks good to me, but someone else must approve Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: I did an analysis of failed raft_consensus-itest failures before and after this patch. There was no change in the trend, except that RaftConsensusITest.TestAutoCreateReplica stopped being flaky. :) I looked into it, and I think it was just coincidence. That test doesn't even have failure detection enabled, plus it doesn't tombstone any tablets. I think this patch is ready to ship. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: Well, actually, let me hit raft_consensus-itest a little harder. It's possible that I made it flaky and hadn't previously noticed. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: The new tests still look good. 100% reliable after 100 loops w/ TSAN + 8x stress: tombstoned_voting-itest: http://dist-test.cloudera.org/job?job_id=mpercy.1503540656.2157 tombstoned_voting-stress-test: http://dist-test.cloudera.org/job?job_id=mpercy.1503540557.1392 If other tests get flaky, they'll show up on the flaky test dashboard and I'll fix them. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: > lgtm. Did you do any last test loop of the relevant tests after the > latest round of changes? Thanks for the review. I'll double-check that now and will post results. I did see some unrelated failures on some unrelated tests, but we've have a lot of code go in this week. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Alexey Serbin has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 13: Code-Review+1 lgtm. Did you do any last test loop of the relevant tests after the latest round of changes? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/6960/12/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: Line 45: #include "kudu/tserver/tserver.pb.h" > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/6960/12/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 39: #include "kudu/integration-tests/external_mini_cluster.h" > warning: #includes are not sorted properly [llvm-include-order] Done -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#13). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Persist ConsensusMetadata before returning from TabletCopyClient::Start() because we need cmeta to Init() RaftConsensus, which happens when registering the replica in TSTabletManager. * TSTabletManager::DeleteTablet() should not consider FAILED == deleted, since we no longer destroy RaftConsensus when tombstoning a replica. * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 27 files changed, 1,439 insertions(+), 296 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/13 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#12). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Persist ConsensusMetadata before returning from TabletCopyClient::Start() because we need cmeta to Init() RaftConsensus, which happens when registering the replica in TSTabletManager. * TSTabletManager::DeleteTablet() should not consider FAILED == deleted, since we no longer destroy RaftConsensus when tombstoning a replica. * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 27 files changed, 1,439 insertions(+), 296 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/12 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#11). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Persist ConsensusMetadata before returning from TabletCopyClient::Start() because we need cmeta to Init() RaftConsensus, which happens when registering the replica in TSTabletManager. * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 27 files changed, 1,437 insertions(+), 294 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/11 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (29 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS10, Line 1452: boost:: > nit: "using boost::optional" is already specified Done PS10, Line 1494: local_last_logged_opid = queue_->GetLastOpIdInLog(); > What if tombstone_last_logged_opid was provided as well? Is it some some o It's OK. I added a comment to explain. Line 1856: CHECK_EQ(kStopping, state_) << State_Name(state_); > this is already CHECKed in SetStateUnlocked, no? True, removed Line 1863: cmeta_->set_leader_uuid(""); > ClearLeaderUnlocked() ? Done http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS10, Line 351:|^ : //|| : //++ > nit: prefer the slimmer version at L424 Done PS10, Line 372: // Raft is in the process of stopping and will not accept writes. : kStopping, > nit: is it possible to request a vote if in this state? It seems it is, bu yes; done http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: PS10, Line 236: // Request the given replica to vote. > nit: could you document at least first 5 parameters of this method? It's just a thin wrapper around an RPC call. I added a comment referring to its definition. http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/delete_tablet-itest.cc File src/kudu/integration-tests/delete_tablet-itest.cc: Line 95: // Tablet bootstrap failure should result in a SHUTDOWN tablet with an error > per discussion on slack, let's revert this part of this patch since it isn' Done http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: PS10, Line 3044: back come > nit: come back removed this change per convo w/ Todd on slack and here in the comments http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: Line 18: #include "kudu/consensus/metadata.pb.h" > nit: from IWYU Done, thank you PS10, Line 46: using std::string; : using std::vector; > nit: include these? Done PS10, Line 167: TS1 > nit: the tablet? Or it's possible to tombstone a tablet server now? added a comment to explain PS10, Line 172: TServerDetails* ts1_ets > nit: consider moving this closer to the call site where it's used. Done Line 173: scoped_refptr replica; > nit: does it make sense to verify the scenario below works without errors e Makes sense. Done. PS10, Line 181: "A" > Could you add a comment explaining that the actual candidate UUID is not cr Done PS10, Line 249: ts0_replica->consensus()->StepDown() > nit: should it be some selective error handling based on the code set in th I'm not sure why else it would fail. This is a pretty controlled environment. If a disk fails, all the tests will fail on that Jenkins run so we don't really care about that. PS10, Line 253: ts0 > nit: 'TS0' since it has been referred as that already. Is your comment parser case-sensitive? :) Done PS10, Line 256: FOLLOWER > Could it change to something like 'LEARNER' in the middle and bring some fl Nope. PS10, Line 263: error (nee failed > per other comment, back out this change Done PS10, Line 294: Shut each TS > down Done http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 18: #include > nit: from IWYU report: thanks! PS10, Line 44: using std::atomic; : using std::string; : using std::thread; : using std::unique_lock; : using std::vector; > nit: includes? Done PS10, Line 57: : num_workers_(1), > this test seems to be implemented for multiple workers but only has one wor I initially wanted >1 worker but had trouble making it deterministic enough to assert on things I wanted to assert on, so I dropped it to 1 worker. When I tried to simplify the code for a single worker it didn't really seem better, so I kept the more general implementation. http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 2489: LOG(WARNING) << "Tablet " << tablet->ToString() << " has failed on TS " > wouldn't it be natural to have tablet::SHUTDOWN check here instead? Or it' I thought it was a little unwise to have a crash based on remote state reporting
[kudu-CR] KUDU-871. Support tombstoned voting
Alexey Serbin has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (12 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: PS10, Line 236: // Request the given replica to vote. nit: could you document at least first 5 parameters of this method? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: Line 18: #include "kudu/consensus/metadata.pb.h" nit: from IWYU src/kudu/integration-tests/tombstoned_voting-itest.cc should add these lines: #include #include #include #include #include #include #include #include #include #include #include "kudu/consensus/consensus.pb.h" #include "kudu/consensus/opid.pb.h" #include "kudu/consensus/opid_util.h" #include "kudu/fs/fs_manager.h" #include "kudu/gutil/ref_counted.h" #include "kudu/integration-tests/internal_mini_cluster.h" #include "kudu/tablet/metadata.pb.h" #include "kudu/tserver/tserver.pb.h" #include "kudu/util/env.h" #include "kudu/util/monotime.h" #include "kudu/util/status.h" #include "kudu/util/test_macros.h" #include "kudu/util/test_util.h" PS10, Line 167: TS1 nit: the tablet? Or it's possible to tombstone a tablet server now? PS10, Line 172: TServerDetails* ts1_ets nit: consider moving this closer to the call site where it's used. Line 173: scoped_refptr replica; nit: does it make sense to verify the scenario below works without errors even after restarting the tablet server TS1? PS10, Line 181: "A" Could you add a comment explaining that the actual candidate UUID is not crucial in this test? PS10, Line 249: ts0_replica->consensus()->StepDown() nit: should it be some selective error handling based on the code set in the 'resp'? What if it failed due to some other reason -- would the test handle that properly? PS10, Line 253: ts0 nit: 'TS0' since it has been referred as that already. PS10, Line 256: FOLLOWER Could it change to something like 'LEARNER' in the middle and bring some flakiness? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 18: #include nit: from IWYU report: src/kudu/integration-tests/tombstoned_voting-stress-test.cc should add these lines: #include #include #include #include #include #include #include #include #include #include "kudu/common/wire_protocol.pb.h" #include "kudu/consensus/consensus.pb.h" #include "kudu/consensus/opid.pb.h" #include "kudu/gutil/macros.h" #include "kudu/integration-tests/external_mini_cluster.h" #include "kudu/integration-tests/external_mini_cluster_fs_inspector.h" #include "kudu/tablet/metadata.pb.h" #include "kudu/util/monotime.h" #include "kudu/util/net/net_util.h" #include "kudu/util/status.h" #include "kudu/util/test_macros.h" #include "kudu/util/test_util.h" src/kudu/integration-tests/tombstoned_voting-stress-test.cc should remove these lines: - #include "kudu/consensus/metadata.pb.h" // lines 25-25 - #include "kudu/consensus/raft_consensus.h" // lines 26-26 - #include "kudu/util/random.h" // lines 32-32 http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 2489: LOG(WARNING) << "Tablet " << tablet->ToString() << " has failed on TS " wouldn't it be natural to have tablet::SHUTDOWN check here instead? Or it's not guaranteed to be SHUTDOWN at this point? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: PS10, Line 304: LOG(DFATAL) The rest of the CHECKs would work for non-debug builds as well. What is the reason of having LOG(DFATAL), not LOG(FATAL) here? If there is any, please add a comment. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (6 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1856: CHECK_EQ(kStopping, state_) << State_Name(state_); this is already CHECKed in SetStateUnlocked, no? Line 1863: cmeta_->set_leader_uuid(""); ClearLeaderUnlocked() ? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/delete_tablet-itest.cc File src/kudu/integration-tests/delete_tablet-itest.cc: Line 95: // Tablet bootstrap failure should result in a SHUTDOWN tablet with an error per discussion on slack, let's revert this part of this patch since it isn't directly related and has external effects http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS10, Line 263: error (nee failed per other comment, back out this change PS10, Line 294: Shut each TS down http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: PS10, Line 57: : num_workers_(1), this test seems to be implemented for multiple workers but only has one worker. Should this be >1? or should the code be simplified? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Alexey Serbin has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (2 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS10, Line 1494: local_last_logged_opid = queue_->GetLastOpIdInLog(); What if tombstone_last_logged_opid was provided as well? Is it some some of programming error? Or it's OK? If the former, consider returning IllegalArgument or adding DCHECK() http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS10, Line 372: // Raft is in the process of stopping and will not accept writes. : kStopping, nit: is it possible to request a vote if in this state? It seems it is, but it would be nice to explicitly specify that. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Andrew Wong has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS10, Line 1452: boost:: nit: "using boost::optional" is already specified http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS10, Line 351:|^ : //|| : //++ nit: prefer the slimmer version at L424 Or have this in one place and have SetStateUnlocked() refer here? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: PS10, Line 3044: back come nit: come back http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS10, Line 46: using std::string; : using std::vector; nit: include these? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: PS10, Line 44: using std::atomic; : using std::string; : using std::thread; : using std::unique_lock; : using std::vector; nit: includes? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: PS10, Line 347: // TODO(mpercy): Set to failed if Init() fails? Does this still apply? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: PS10, Line 357: Has no meaning for non-tombstoned tablets Is this to say that this will be boost::none for non-tombstoned tablets? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS10, Line 199: SetError nit: sort of brought this up on Slack, I think the naming for this could be a bit clearer. SetError() makes it seem like a simple setter, but it's not. Maybe SetErrorAndShutdown()? Or SetFailed() and keep the FAILED state as we discussed. The latter has the added benefit of being an external indicator of failure, rather than having each external accessor (e.g. web UI, ksck, etc.) get the error_ member. http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS10, Line 298: RETURN_NOT_OK(WriteConsensusMetadata()); This change isn't noted in the commit message. Mind documenting it, or at least adding a comment explaining? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: I think the failure was caused by already-flaky java tests but I am investigating. I'd like to figure out how to run the java tests on dist-test to do a quick comparison on failure rate w/ master. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#10). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,439 insertions(+), 340 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/10 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#9). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,438 insertions(+), 341 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/9 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#8). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,393 insertions(+), 341 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/8 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/6960/7/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: Line 115: using kudu::consensus::MinimumOpId; > warning: using decl 'MinimumOpId' is unused [misc-unused-using-decls] Done Line 117: using kudu::consensus::OpIdBiggerThan; > warning: using decl 'OpIdBiggerThan' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS6, Line 195: // Ask TS1 for a vote that should be denied (old last-logged opid). : s = itest::RequestVote(ts1_ets, tablet_id, "B", current_term, old_opid, : /*ignore_live_leader=*/ true, /*is_pre_election=*/ false, kTimeout); > Maybe I'm misunderstanding. It sounds like you're asking for a test of re-v my request was to have a directed test where the last logged op id of the requestor is greater than the tombstoned replica's and not just precisely its last logged opid, but that should still be granted, assuming this will be the common case since the replica is tombstoned and doesn't actually increase it's last logged op id. moreover I was also wondering if there would be a problem if the last logged op id of the tombstoned replica turns back due to its log being deleted (and it being restarted, for instance) -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Hello David Ribeiro Alves, Todd Lipcon, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#7). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,397 insertions(+), 345 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/7 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Edward Fancher has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: Line 48: Replicated masters can have tombstoned replicas too, is that right? Does the tombstone voting apply to them? If so, would it be worthwhile to also have a test that includes tombstoned voting for a master tablet replica? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Alexey Serbin has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 6: (7 comments) Glanced through one of the tests. Will take a closer look today. http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: PS6, Line 146: LOG(WARNING) Why warning? Is it something non-regular or non-expected? Would INFO be more appropriate here? Also, is it crucial to have these INFO logs output from the test? PS6, Line 148: LOG(FATAL) nit: would FAIL() be more idiomatic in this case (given it's a test)? PS6, Line 159: WARNING INFO? But I might be missing something here. PS6, Line 197: 2 nit: maybe cluster_->num_tablet_servers() ? PS6, Line 228: voter_thread What happens to the thread if the test fails in one of the ASSERT_OK() below? Consider joining the thread in that case as well (it might be SIGSEGV or something like that otherwise). PS6, Line 232: SleepFor(MonoDelta::FromMilliseconds(500)); Could you add some comments explaining why this delay is necessary? What would happen if this delay were 10 times less? PS6, Line 240: SleepFor(MonoDelta::FromMilliseconds(500)); ditto -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 6: (25 comments) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS6, Line 1485: && state_ == kRunning why is this condition necesary? if we've just been tombstoned it still makes sense that we should withhold votes for a period of time, right? PS6, Line 1509: CheckSafeToVoteUnlocked the duplicate logic here with CheckSafeToVoteUnlocked is a little unnerving. What do you think about essentially inlining its logic here and merging with this if condition? eg something of this nature: switch (state_) { case kShutdown: return IllegalState... case kRunning: local_last_logged = GetLatestFromLog() break; default: if (FLAGS_allow_tombstone_voting && tombstone_last_logged_opid) { local_last_logged = tombstone_last_logged; } else { return IllegalState } break; } I think that might be easier to follow and net less lines of code PS6, Line 1514: DCHECK(tombstone_last_logged_opid) << LogPrefixUnlocked() << "expected last-logged opid"; : local_last_logged_opid = *tombstone_last_logged_opid; if this DCHECK fails, would we crash anyway trying to dereference the boost::none? if so, I think we should either use CHECK (so we get a nice crash log) or we should try to handle this case by just denying the vote (if we aren't 100% sure that this would always be the case, we can always vote conservatively) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: Line 130: RaftConsensus(ConsensusOptions options, can the ctor be made private? (may need ALLOW_MAKE_SHARED) I don't see any external usage of it but maybe my grepping was bad. PS6, Line 271: term > nit: current_term() would be more explicit I think CurrentTerm or Term would be better since this takes lock_ and that lock is often held for long periods of time (thus the call may be quite slow compared to your typical getter) PS6, Line 292: Tombstoned voting > not super clear what "tombstoned voting" is maybe explain it somewhere? agreed. Maybe better here to just say that it synchronously transitions to kStopped state, and refer to the kStopped definition in the state enum for what behavior that entails? Line 297: void Shutdown(); same Line 339: enum State { it would be nice to have a state transition diagram of sorts here. Or alternatively, for each state, list out the potential subsequent states and what triggers the transition? eg I can't remember now whether we can transition back from kStopped to kRunning or not. PS6, Line 340: // RaftConsensus has been freshly constructed. : kNew, is this state ever externally exposed, considering we now have the factory method that returns a post-init object? if not, worth noting as much PS6, Line 343: // RaftConsensus has been initialized. : kInitialized, in terms of behavior, how is this state different than Stopped? eg let's say I start up a server which has some tablets in tombstoned state, do their RaftConsensus instances end up in kInitialized state or kStopped state? Is it important to make a distinction? PS6, Line 391: // Initializes the RaftConsensus object. This should be called before : // publishing this object to any thread other than the thread that invoked : // the constructor. : Status Init(); since the constructor is only invoked by the factory method, maybe not necessary to be so explicit? PS6, Line 647: // existence of a known last logged opid. : Status CheckSafeToVoteUnlocked(const boost::optional& tombstone_last_logged_opid) : similar to David's comment on the implementation: I think it would make more sense to pass a 'bool tombstoned_op_id_known' or somesuch, and in the implementation add a comment explaining the scenario(s) in which we would _not_ know the tombstoned op id and why it's not safe to vote in those scenarios. Use of a bool instead of the opid makes it more clear that we just care about whether we know a valid opid, and aren't deciding here based on its value. Line 783: AtomicBool stopped_; it seems it was already like this, but what's the distinction between this 'stopped_' member and just checking 'state_ == kStopped'? mind adding a comment explaining and perhaps a TODO to get rid of it if possible? (not going to make you get rid of it in this patch since it's pre-existing and I dont think you made it worse) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS6, Line 80: call Stop() on you mean 'manually tombstone' right?
[kudu-CR] KUDU-871. Support tombstoned voting
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS6, Line 272: // Wait until the TabletReplica is fully in STOPPED or SHUTDOWN state. does this have some requirements (does the replica need to be in "stopping" state or somesuch?) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS6, Line 525: tombtone not from this patch, but typo -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 6: (7 comments) leaving for luch, more comments forthcoming http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS6, Line 1519: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned based on last-logged opid " :<< local_last_logged_opid; move this to the else block above and thus avoid the extra if? PS6, Line 2538: const optional& tombstone_last_logged_opid whats the point of passing an opid here if you're only checking for its presence? also you mention on LOC 1508 "in which case we are guaranteed that // 'tombstone_last_logged_opid' is set by CheckSafeToVoteUnlocked()" but that is not happening or am I missing something? http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS6, Line 124: static Status Create(ConsensusOptions options, :RaftPeerPB local_peer_pb, :scoped_refptr cmeta_manager, :ThreadPool* raft_pool, :std::shared_ptr* consensus_out); didn't you delete this a while back? PS6, Line 271: term nit: current_term() would be more explicit PS6, Line 292: Tombstoned voting not super clear what "tombstoned voting" is maybe explain it somewhere? PS6, Line 354: RaftConsensus has been stopped. maybe explain what this means in the context of the other states? something like: "RaftConsensus has been stopped. No more transactions or commits are accepted but will still reply to election requests if tombstoned." http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS6, Line 195: // Ask TS1 for a vote that should be denied (old last-logged opid). : s = itest::RequestVote(ts1_ets, tablet_id, "B", current_term, old_opid, : /*ignore_live_leader=*/ true, /*is_pre_election=*/ false, kTimeout); can you add a test for same candidate same term but with an opid that is greater that the tombstoned replica's op_id? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#6). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Protect TabletMetadata::tombstone_last_logged_opid_ with a lock. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 27 files changed, 1,145 insertions(+), 239 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/6 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#5). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Protect TabletMetadata::tombstone_last_logged_opid_ with a lock. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 27 files changed, 1,149 insertions(+), 242 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/5 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 4: (32 comments) http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 188: Status RaftConsensus::Init() { > I think the DCHECK on state here was useful, even if it needs to be broaden Done PS4, Line 1493: tombstone_last_logged_opid ? *tombstone_last_logged_opid : : GetLatestOpIdFromLog(); > could we simplify by always using 'max()' of the entry from the metadata an I don't think so; see my response to the same question in tablet_service.cc However, I do think we should use the latest opid in the log if Consensus is currently running, since it is guaranteed to be correct in that case. Line 1497: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned"; > any way to improve this with a bit more info? I'll add the tombstone_last_logged_opid here but I can't think of a reason to log more than that because we already log the details whenever we vote. Line 2004: state_ = new_state; > given this is in every case, move it down below the switch? Done Line 2513: Status RaftConsensus::CheckSafeToVoteUnlocked(optional tombstone_last_logged_opid) const { > warning: the parameter 'tombstone_last_logged_opid' is copied for each invo Done PS4, Line 2515: (!tombstone_last_logged_opid > not 100% following why this portion of the condition is necessary if we are voting while tombstoned then we don't have to be running PS4, Line 2520: state_ == kShutdown > should we instead be checking the expected state (ie kStopped)? aren't ther We can tombstoned vote in kInit and kStopped state, even in kRunning actually due to a race between the check in TabletService and execution in RaftConsensus. It's not possible to expose RaftConsensus in kNew state due to code in TabletReplica however it would be better to encapsulate that guarantee within this class so I'll add a Create() factory method. Line 2634: return kMinimumTerm; > under what circumstances do we hit this case? It seems like cmeta_ should b Good catch; this was left over from a previous attempt on this patch and is not possible with the current rev. However I agree with you that a factory method for RaftConsensus would be preferable here. Line 2685: if (cmeta_) { > same removed http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: Line 237:const int64_t candidate_term, > warning: parameter 'candidate_term' is const-qualified in the function decl Done PS4, Line 239:boost::optional ignore_live_leader, :boost::optional is_pre_election, > what's the point of making these optional? don't they have well-defined def I thought it was nice to pass them as optional so that if you don't specify them you get the default behavior as specified in the proto file. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 38: using kudu::consensus::MakeOpId; > warning: using decl 'MakeOpId' is unused [misc-unused-using-decls] Done Line 39: using kudu::consensus::LeaderStepDownResponsePB; > warning: using decl 'LeaderStepDownResponsePB' is unused [misc-unused-using Done Line 41: using kudu::consensus::RaftConsensus; > warning: using decl 'RaftConsensus' is unused [misc-unused-using-decls] Done Line 42: using kudu::consensus::RaftPeerPB; > warning: using decl 'RaftPeerPB' is unused [misc-unused-using-decls] Done Line 47: using kudu::tablet::TabletStatePB; > warning: using decl 'TabletStatePB' is unused [misc-unused-using-decls] Done Line 48: using kudu::tserver::TabletServerErrorPB; > warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: Line 135: // Tablet states are sent in TabletReports and kept in TabletReplica. > what effect will these changes have if there is a rolling upgrade with an o Based on a grep through the code, the master only knows about tablet::RUNNING and tablet::FAILED so it shouldn't even notice. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: Line 216: void SetPreFlushCallback(StatusClosure callback) { pre_flush_callback_ = callback; } > warning: parameter 'callback' is passed by value and only copied once; cons Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 133: set_state(INITIALIZED); > maybe moot based
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 4: (21 comments) did a first pass, probably will have more comments after a second pass but figured I'd send these along http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 188: Status RaftConsensus::Init() { I think the DCHECK on state here was useful, even if it needs to be broadened a bit now PS4, Line 1493: tombstone_last_logged_opid ? *tombstone_last_logged_opid : : GetLatestOpIdFromLog(); could we simplify by always using 'max()' of the entry from the metadata and the entry from our log (if we have a log) Line 1497: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned"; any way to improve this with a bit more info? Line 2004: state_ = new_state; given this is in every case, move it down below the switch? PS4, Line 2515: (!tombstone_last_logged_opid not 100% following why this portion of the condition is necessary PS4, Line 2520: state_ == kShutdown should we instead be checking the expected state (ie kStopped)? aren't there some other states we might be in where we don't want to vote? Line 2634: return kMinimumTerm; under what circumstances do we hit this case? It seems like cmeta_ should be set in Init() and we shouldn't ever be calling functions on RaftConsensus until after a successful Init() (see other comment about changing to a factory function so it's more clear that we never expose a constructed-by-not-initialized instance) Line 2685: if (cmeta_) { same http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: PS4, Line 239:boost::optional ignore_live_leader, :boost::optional is_pre_election, what's the point of making these optional? don't they have well-defined defaults (false) anyway? http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: Line 135: // Tablet states are sent in TabletReports and kept in TabletReplica. what effect will these changes have if there is a rolling upgrade with an old master and new tablet servers? do we need to force an incompatibility so that people upgrade masters first? or is it safe to have these show up as 'UNKNOWN' on an old-version master? http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 133: set_state(INITIALIZED); maybe moot based on the suggestion of a factory method, but if a method like this remains, the state change should wait until the method is successful Line 272: // Release mem tracker resources. is this comment stale? PS4, Line 407: LOG_WITH_PREFIX(INFO) << "Fetched status for addr " << _ : << ": " << SecureDebugString(*status_pb_out); leftover debug print? Line 427: CHECK_EQ(INITIALIZED, state_); redundant with the DCHECK in set_state http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS4, Line 75: TabletReplica(scoped_refptr meta, : scoped_refptr cmeta_manager, : consensus::RaftPeerPB local_peer_pb, : ThreadPool* apply_pool, : Callbackmark_dirty_clbk); : : // Initializes RaftConsensus. : Status Init(ThreadPool* raft_pool); instead of having this split "construct and then init" how about a factory method which would have a similar signature to the original constructor? eg: static Status Create(scoped_refptr meta, ..., scoped_refptr* replica); which does the construction and Init()? That way there is less externally-visible lifecycle for callers to think about (and maybe could reduce one of the enum values too)? PS4, Line 96: tombstoned voting in general, right? ie what would happen if you called Stop() but the data wasn't tombstoned, and you asked to vote? PS4, Line 169: // If any peers in the consensus configuration lack permanent uuids, get them via an : // RPC call and update. : // TODO: move this to raft_consensus.h. : Status UpdatePermanentUuids(); can you remove this while you're here? seems to be dead Line 277: // Wait until the TabletReplica is fully in STOPPED state. or SHUTDOWN http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: Line 947: tablet::TabletDataState data_state =
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#4). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Protect TabletMetadata::tombstone_last_logged_opid_ with a lock. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 27 files changed, 1,091 insertions(+), 216 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/4 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon