Mike Percy has posted comments on this change. Change subject: [tools] Manual recovery tools (part 1) ......................................................................
Patch Set 9: (14 comments) http://gerrit.cloudera.org:8080/#/c/4834/6/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: Line 268: int count, > done, unfortunately these are not caught while compiling on OS X :(. yeah it's just good hygiene, std::string isn't always caught on linux either due to some header files in gutil using std::string :( http://gerrit.cloudera.org:8080/#/c/4834/6/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: > I think having all the tests related to ./bin/kudu toolset in one file is b > I think having all the tests related to ./bin/kudu toolset in one file is > better. Why do you think it is better? The tests may do many different types of things and they can't run in parallel if they are all clumped together in the same files. Also, source files that are many thousands of lines are difficult to navigate - at least that is my subjective opinion. I don't think it's reached catastrophic proportions yet in this case, though. Line 950: const int kNumTservers = 3; > Can you explain bit more what you meant by test becoming unstable ? The int Under extreme load, such as on an AWS / GCE Jenkins test box, it's possible that a tablet could miss a heartbeat for 3 seconds because of some IO or thread creation latency. If that happens, then the node will get evicted. I am not sure if that will cause test flakiness in these particular tests, I just thought the timeout was quite short. Line 1149 > Done, although I am trying to see why I am not able to run ts->Start() agai Not sure about macOS, but on the ExternalMiniCluster the proper thing to do after Shutdown() is Restart() if you want to reuse the same ports. Typically, Start() should only be called once by design. It's possible that the MiniTabletServer::Restart() semantics are buggy and untested. http://gerrit.cloudera.org:8080/#/c/4834/9/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: Line 944: TEST_F(ToolTest, TestRemoteReplicaCopy) { If we're testing copying, should we write some data in this test? You could need to initiate the leader election manually, ideally with the helper functions in itest:: Line 994: // by attempting to copy tablets while we tombstone the tablet on destination server. I don't think you ever elected a leader, so I don't know who would interfere Line 1042: // tombstoned_tablet_id is copied from source to destination. passive voice nit: Copy tombstoned_tablet_id from source to destination. Line 1043: NO_FATALS(RunActionStdoutNone(Substitute("remote_replica copy $0 $1 $2 --force_copy", Should --force be needed if it's tombstoned? Line 1048: // deleted_tablet_id is copied from source to destination. nit: use active, not passive voice, i.e. Copy deleted_tablet_id from source to destination. Line 1050: ASSERT_OK(WaitUntilTabletInState(src_ts, deleted_tablet_id, Why is this needed? We just did this on line 1039 Line 1135: ASSERT_OK(ts->Start()); Should be Restart() I think? But looking at the impl it seems that it just defers to Start(). Weird. What even is the point of such an implementation? It seems broken to me... http://gerrit.cloudera.org:8080/#/c/4834/6/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: Line 299: RETURN_NOT_OK(GetServerStatus(dst_address, TabletServer::kDefaultPort, > That's correct: Cool, thanks for the explanation. Line 347: .Description("Copy a replica from one Kudu Tablet Server to another") any thoughts on using "tablet replica" throughout this file instead of just "replica"? http://gerrit.cloudera.org:8080/#/c/4834/9/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: Line 331: tserver::TabletServerErrorPB_Code(resp.error().code()))); oops, typo in my original suggestion, should be TabletServerErrorPB::Code_Name() -- To view, visit http://gerrit.cloudera.org:8080/4834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I113a25e9b6c14f7c3814140917b61e35030b58d0 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
