Volodymyr Verovkin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14300 )
Change subject: KUDU-2800 draft 2 ...................................................................... Patch Set 7: (16 comments) http://gerrit.cloudera.org:8080/#/c/14300/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14300/7//COMMIT_MSG@7 PS7, Line 7: KUDU-2800 draft 2 > Add proper description Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc File src/kudu/integration-tests/raft_config_change-itest.cc: http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@460 PS6, Line 460: // Check re-replication during slow tablet replica bootstrap. > nit: add a period in the end of the sentence. Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@464 PS6, Line 464: const string& evicted_replica_uuid, : const MonoDelta& kTimeout) > I think it's better to explicitly say that replica is not evicted. Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@467 PS6, Line 467: ASSERT_OK(GetLeaderReplicaWithRetries(tablet_id_, &leader)); > Here and in other new test scenarios: add SKIP_IF_SLOW_NOT_ALLOWED() since Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@483 PS6, Line 483: > nit: the tablet's replica Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@492 PS6, Line 492: // Extra sanity checks. > nit: if it's used only once, maybe put the lambda's code inside ASSERT_EVEN Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@500 PS6, Line 500: > server Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@585 PS6, Line 585: _servers.front(); > Drop this: I think it's too vague to say that the replica removes its data Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@590 PS6, Line 590: // Inject delay in > I think it's quite obvious what happens in two lines below; maybe drop this Done http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@597 PS6, Line 597: replica_ets->Shutdown(); : ASSERT_OK(ts->Restart()); : : // Find a leader. In case we paused the leader above, this will wait until : // we have elected a new one. : TServerDetails* leader = nullptr; : w > I think GetLeaderReplicaWithRetries() takes care of retries already. Why i I think if we shutdown leader, it may not quickly propagate across the cluster and we can see it during GetLeaderReplicaWithRetries() till consensus detect loss of leader and elect new one. http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@628 PS6, Line 628: // Some WAL segments must exist, but wal segment 1 must not exist. : ASSERT_OK(inspect_->WaitForFilePatte > Given that there is ASSERT_EVENTUALLY() below, is it necessary at all? We give that replica time equal to tablet_open_bootstrap_inject_latency_ms to finish bootstrap in order to make sure that, once replica boot up and try to re-join consensus, it will fail doing that. http://gerrit.cloudera.org:8080/#/c/14300/6/src/kudu/integration-tests/raft_config_change-itest.cc@632 PS6, Line 632: : //Let the replica finish slow bootstrap : SleepFor(MonoDelta::FromSeconds(3)); : : ASSERT_EVENTUALLY([&]() { : ValidateConsensusState(no_replica_servers.front(), follower_uuid, kTimeout); : }); : } : > This is a repeating pattern. Maybe, factor it out into a class method? Done http://gerrit.cloudera.org:8080/#/c/14300/7/src/kudu/integration-tests/raft_config_change-itest.cc File src/kudu/integration-tests/raft_config_change-itest.cc: http://gerrit.cloudera.org:8080/#/c/14300/7/src/kudu/integration-tests/raft_config_change-itest.cc@461 PS7, Line 461: SlowBootstrapTest > nit: maybe, SlowTabletBootstrapTest would be more precise? Done http://gerrit.cloudera.org:8080/#/c/14300/7/src/kudu/integration-tests/raft_config_change-itest.cc@508 PS7, Line 508: TServerDetails* leader = nullptr; : ASSERT_OK(GetLeaderReplicaWithRetries(tablet_id_, &leader)); : consensus::ConsensusStatePB cstate; : ASSERT_OK(GetConsensusState(leader, tablet_id_, kTimeout, : EXCLUDE_HEALTH_REPORT, &cstate)); : // Check that new replica has not been added. : ASSERT_FALSE(IsRaftConfigMember(no_replica_servers.front(), cstate.committed_config())); : // Check that the replica at the tablet servers that was restarted is not removed. : ASSERT_TRUE(IsRaftConfigMember(replica_servers.front(), cstate.committed_config())); > Does it make sense to make the newly introduced ValidateConsensusState() to This case is slightly different. In ValidateConsensusState() we test ASSERT_TRUE(IsRaftConfigMember(no_replica_servers.front(), cstate.committed_config())); but here ASSERT_FALSE(IsRaftConfigMember(no_replica_servers.front(), cstate.committed_config())); http://gerrit.cloudera.org:8080/#/c/14300/7/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/14300/7/src/kudu/tserver/ts_tablet_manager.cc@144 PS7, Line 144: hidden > Change to 'unsafe'. That way it's automatically hidden from the documentat Done http://gerrit.cloudera.org:8080/#/c/14300/7/src/kudu/tserver/ts_tablet_manager.cc@1077 PS7, Line 1077: "Injecting " << FLAGS_tablet_open_bootstrap_inject_latency_ms : << "ms delay in tablet bootstrapping" > nit: maybe, use strings::Substitute() to format the message for easier read It's a copy-paste of FLAGS_delete_tablet_inject_latency_ms code. I found 4 code entries with exactly same code. -- To view, visit http://gerrit.cloudera.org:8080/14300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie1fee42053194f51d7a869ce14788095d6627ed9 Gerrit-Change-Number: 14300 Gerrit-PatchSet: 7 Gerrit-Owner: Volodymyr Verovkin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Volodymyr Verovkin <[email protected]> Gerrit-Comment-Date: Wed, 02 Oct 2019 20:51:22 +0000 Gerrit-HasComments: Yes
