Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14300 )

Change subject: KUDU-2800 Test long bootstrapping tablet replicas
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14300/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14300/10//COMMIT_MSG@11
PS10, Line 11:
             : 1) Tablet replica bootstraps long time. No data modications 
happen during these time. Consensus state does not change.
             :
             : 2) Tablet replica is shutdown for some time. During this time a 
lot of data modifications happen, log rolls, replica is replaced (with 
re-replication). Replica restarts and find out the it fell behind.
             :
             : 3) Tablet replica bootstraps long time. During this time a lot 
of data modifications happen, log rolls, replica is replaced (with 
re-replication). Replica restarts and find out the it fell behind.
nit: could you wrap these lines at 72 characters? per #6 of 
https://chris.beams.io/posts/git-commit/


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc
File src/kudu/integration-tests/raft_config_change-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@464
PS10, Line 464: ValidateConsensusState
nit: maybe rename this so it's clearer from the callsite what it's trying to 
achieve? Something like ValidateReplicaMoved() or something?


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@484
PS10, Line 484:
              :   FLAGS_num_tablet_servers = 4;
              :   FLAGS_num_replicas = 3;
              :   NO_FATALS(BuildAndStart(
              :       {"--follower_unavailable_considered_failed_sec=3"},
              :       {"--tserver_unresponsive_timeout_ms=3000"}));
              :
              :   ASSERT_EQ(4, tablet_servers_.size());
              :
              :   // Extra sanity checks.
              :   vector<string> replica_servers = 
GetServersWithReplica(tablet_id_);
              :   ASSERT_EQ(3, replica_servers.size());
              :   vector<string> no_replica_servers = 
GetServersWithoutReplica(tablet_id_);
              :   ASSERT_EQ(1, no_replica_servers.size());
nit: is it important for the flags here to be different than the flags in the 
cases below? If not, maybe consider encapsulating in a function along the lines 
of SetupClusterForUnavailability() and reusing it in the below tests?


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@508
PS10, Line 508: ASSERT_EVENTUALLY([&] {
Is this ASSERT_EVENTUALLY necessary? If the replicas have been already moved in 
the first place, won't this never converge?


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@516
PS10, Line 516: // Check that the replica at the tablet servers that was 
restarted is not removed.
              :     ASSERT_TRUE(IsRaftConfigMember(replica_servers.front(), 
cstate.committed_config()));
Should this check be done on all 3 of the replica_servers?


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@597
PS10, Line 597: replica->uuid()
Isn't this just follower_uuid?


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@627
PS10, Line 627:
              :   LOG(INFO) << "Waiting for log GC on " << leader->uuid();
              :   // Some WAL segments must exist, but wal segment 1 must not 
exist.
              :   ASSERT_OK(inspect_->WaitForFilePatternInTabletWalDirOnTs(
              :       leader_index, tablet_id_, { "wal-" }, { "wal-000000001" 
}));
              :   LOG(INFO) << "Log GC complete on " << leader->uuid();
How long does it take for this test to finish? Would it make sense to flush 
more frequently via --flush_threshold_secs so the WALs are not anchored for as 
long?


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/integration-tests/raft_config_change-itest.cc@635
PS10, Line 635:   SleepFor(MonoDelta::FromSeconds(3));
nit: this probably isn't needed, right? Since we're going to retry the below 
check for up to 30 seconds


http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14300/10/src/kudu/tserver/ts_tablet_manager.cc@1077
PS10, Line 1077: << "Injecting " << 
FLAGS_tablet_open_bootstrap_inject_latency_ms
               :                      << "ms delay in tablet bootstrapping";
nit: just FYI Substitute could be used here, e.g.

 LOG(WARNING) << Substitute("Injecting $0 ms delay in in tablet bootstrapping", 
FLAGS_tablet_open_bootstrap_inject_latency_ms);



--
To view, visit http://gerrit.cloudera.org:8080/14300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie1fee42053194f51d7a869ce14788095d6627ed9
Gerrit-Change-Number: 14300
Gerrit-PatchSet: 10
Gerrit-Owner: Volodymyr Verovkin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Greg Solovyev <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Volodymyr Verovkin <[email protected]>
Gerrit-Comment-Date: Fri, 04 Oct 2019 00:54:50 +0000
Gerrit-HasComments: Yes

Reply via email to