Mike Percy has posted comments on this change.

Change subject: [tools] Manual recovery tools (part 1)
......................................................................


Patch Set 9:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/4834/6/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

Line 268:     int count,
> done, unfortunately these are not caught while compiling on OS X :(.
yeah it's just good hygiene, std::string isn't always caught on linux either 
due to some header files in gutil using std::string :(


http://gerrit.cloudera.org:8080/#/c/4834/6/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

> I think having all the tests related to ./bin/kudu toolset in one file is b
> I think having all the tests related to ./bin/kudu toolset in one file is 
> better.

Why do you think it is better? The tests may do many different types of things 
and they can't run in parallel if they are all clumped together in the same 
files. Also, source files that are many thousands of lines are difficult to 
navigate - at least that is my subjective opinion. I don't think it's reached 
catastrophic proportions yet in this case, though.


Line 950:   const int kNumTservers = 3;
> Can you explain bit more what you meant by test becoming unstable ? The int
Under extreme load, such as on an AWS / GCE Jenkins test box, it's possible 
that a tablet could miss a heartbeat for 3 seconds because of some IO or thread 
creation latency. If that happens, then the node will get evicted. I am not 
sure if that will cause test flakiness in these particular tests, I just 
thought the timeout was quite short.


Line 1149
> Done, although I am trying to see why I am not able to run ts->Start() agai
Not sure about macOS, but on the ExternalMiniCluster the proper thing to do 
after Shutdown() is Restart() if you want to reuse the same ports. Typically, 
Start() should only be called once by design.

It's possible that the MiniTabletServer::Restart() semantics are buggy and 
untested.


http://gerrit.cloudera.org:8080/#/c/4834/9/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

Line 944: TEST_F(ToolTest, TestRemoteReplicaCopy) {
If we're testing copying, should we write some data in this test? You could 
need to initiate the leader election manually, ideally with the helper 
functions in itest::


Line 994:   // by attempting to copy tablets while we tombstone the tablet on 
destination server.
I don't think you ever elected a leader, so I don't know who would interfere


Line 1042:   // tombstoned_tablet_id is copied from source to destination.
passive voice nit: Copy tombstoned_tablet_id from source to destination.


Line 1043:   NO_FATALS(RunActionStdoutNone(Substitute("remote_replica copy $0 
$1 $2 --force_copy",
Should --force be needed if it's tombstoned?


Line 1048:   // deleted_tablet_id is copied from source to destination.
nit: use active, not passive voice, i.e. 

Copy deleted_tablet_id from source to destination.


Line 1050:   ASSERT_OK(WaitUntilTabletInState(src_ts, deleted_tablet_id,
Why is this needed? We just did this on line 1039


Line 1135:   ASSERT_OK(ts->Start());
Should be Restart() I think? But looking at the impl it seems that it just 
defers to Start(). Weird. What even is the point of such an implementation? It 
seems broken to me...


http://gerrit.cloudera.org:8080/#/c/4834/6/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 299:   RETURN_NOT_OK(GetServerStatus(dst_address, 
TabletServer::kDefaultPort,
> That's correct:
Cool, thanks for the explanation.


Line 347:       .Description("Copy a replica from one Kudu Tablet Server to 
another")
any thoughts on using "tablet replica" throughout this file instead of just 
"replica"?


http://gerrit.cloudera.org:8080/#/c/4834/9/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 331:                             
tserver::TabletServerErrorPB_Code(resp.error().code())));
oops, typo in my original suggestion, should be TabletServerErrorPB::Code_Name()


-- 
To view, visit http://gerrit.cloudera.org:8080/4834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I113a25e9b6c14f7c3814140917b61e35030b58d0
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to