Adar Dembo has posted comments on this change.

Change subject: [tools] Add a 'kudu tablet relocate' tool
......................................................................


Patch Set 4:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/7444/4//COMMIT_MSG
Commit Message:

Line 18: tablet bootstraps. As a result of this extra fragility, the tool
What sort of guidance should the tool provide w.r.t. the state of the world if 
relocate_replica is interrupted midway through?


PS4, Line 33: Change-Id:
            : I8684e4826bfeaf36b31d297ec1e49897705867f1
Didn't mean to include this?


http://gerrit.cloudera.org:8080/#/c/7444/4/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

Line 18: #include <iosfwd>
Still need this?


Line 131:   ~KsckTest() {}
Can remove this now?


http://gerrit.cloudera.org:8080/#/c/7444/4/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

Line 268:   // Blocks until either the number of results plus errors reported 
equals
Update with mention of what 'out' does.


http://gerrit.cloudera.org:8080/#/c/7444/4/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

Line 386:   Ksck(std::shared_ptr<KsckCluster> cluster, std::ostream& out)
Maybe just add this to the existing constructor with a default value of 
std::cout?

Also, shouldn't we store a pointer to the ostream to adhere to our style 
guidelines?


http://gerrit.cloudera.org:8080/#/c/7444/4/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

Line 19: #include <sstream>
Don't need this?


Line 61:   ~RemoteKsckTest() {}
Can omit?


http://gerrit.cloudera.org:8080/#/c/7444/4/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS4, Line 189: TabletServerMap::const_iterator
Can use auto type here?


PS4, Line 206:   workload.set_timeout_allowed(true);
             :   workload.set_write_timeout_millis(10000);
             :   workload.set_num_replicas(FLAGS_num_replicas);
             :   workload.set_num_write_threads(1);
             :   workload.set_write_batch_size(1);
For the non-obvious non-default values (basically for everything except 
num_replicas) please either remove them or add a comment explaining why they're 
necessary/useful for this test.


Line 215:   int num_moves = AllowSlowTests() ? 3 : 1;
How long does the test take to run when num_moves=1? When num_moves=3?


PS4, Line 236:   ClusterVerifier v(cluster_.get());
             :   NO_FATALS(v.CheckCluster());
Would it be useful to run this after every move rather than just at the end? Or 
does it not matter?


http://gerrit.cloudera.org:8080/#/c/7444/4/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

Line 45: 
Do you think it'd be more useful to express these timeouts in terms of seconds, 
since both operations are expected to take a non-trivial amount of time?


Line 46: DEFINE_int64(relocate_copy_timeout_ms, 30000,
30s doesn't seem like enough time for a tablet copy when tablets may be 
gigabytes in size. Any reason for this not to be a very high value, in the 
multiple minutes range?


Line 123:   std::shared_ptr<KsckMaster> master;
Let's add using std::shared_ptr.


Line 130:   std::shared_ptr<Ksck> ksck(new Ksck(cluster, null_stream));
Why do we need to allocate Ksck on the heap and store it in a shared_ptr? Can't 
we just stack allocate it?


Line 334:   while (MonoTime::Now() < deadline) {
I think it'd be nice to reuse proxies in these loops, to avoid the overhead of 
repeated DNS lookups. Maybe in WaitForCleanKsck too.


Line 356:     SleepFor(MonoDelta::FromMilliseconds(500));
Why not 1000 ms here?


Line 366:   const string& rem_replica_uuid = FindOrDie(context.required_args, 
kRemReplicaUuidArg);
See below: "replica UUID" is a misnomer since this UUID uniquely identifies the 
tserver, not a particular replica on the tserver.


PS4, Line 387:     RETURN_NOT_OK(ChangeLeader(client, tablet_id,
             :                                leader_uuid, leader_hp,
             :                                
MonoDelta::FromMilliseconds(FLAGS_relocate_leader_timeout_ms)));
Why is this step necessary? Won't a new leader automatically be elected if the 
replica removed is the old leader?


Line 434:       .AddRequiredParameter({ kRemReplicaUuidArg, "UUID of the 
replica to be removed" })
These are server UUIDs though, not tablet UUIDs, right?


Line 451:       .AddAction(std::move(relocate_replica))
Nit: should come before remove_replica (alphabetical sort).


-- 
To view, visit http://gerrit.cloudera.org:8080/7444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b7a7243333ba6e6a3d6fce96b220224d6e38a84
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-HasComments: Yes

Reply via email to