Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16530 )
Change subject: [tool] KUDU-2181 CLI to add master ...................................................................... Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/16530/2/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: PS2: > Cool -- I guess dynamic_multi_master-test.cc is a good place to test functi I updated dynamic-multi-master_test.cc to use this CLI tool exclusively since we expect users to only use CLI and not invoke RPC proxy directly. This tool is more advanced and needs bunch of setup and verification steps, so it's better to keep all multi-master addition/removal tests in one place. http://gerrit.cloudera.org:8080/#/c/16530/2/src/kudu/tools/tool_action_master.cc@196 PS2, Line 196: d master $0 to the cluster. Please fol > Alternatively, consider the option of the tool printing the instructions. That's an interesting idea but I'm afraid things like formatting can be tricky with printing instructions on console. So markdown/HTML is still better IMO for such documentation that involves multiple steps. http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc@62 PS3, Line 62: 4 > I'm curious: how did you come up with 4 seconds as the threshold? Does it I basically looked at default settings with Raft heartbeat interval at 500ms, system catalog WAL segment size of 8MB beyond which it'll be GC'ed and 1 MB batch size used between Raft peers. This gave 4 sec number. This doesn't give enough buffer to the actual Raft change config which is instantaneous but could take little bit of time or whether network will be able to transfer 1MB in 500 msecs i.e. 2 MB/sec. So I'll bump this up to 8 secs to provide extra margin. http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc@159 PS3, Line 159: RETURN_NOT_OK((proxy.SyncRpc<ListMastersRequestPB, ListMastersResponsePB>( : req, &resp, "ListMasters", &MasterServiceProxy::ListMastersAsync))); : : if (resp.has_error()) { : return StatusFromPB(resp.error().status()); : } Transient error like leadership change will be handled by LeaderMasterProxy, like Andrew mentioned. > Will it work as expected if re-running the tool? Will calling AddMaster for > the second time with the same host:port succeed? This is an interesting one. With the current code, AddMaster RPC will return "master already present" error. So instead of returning that error in CLI directly, I've updated code to now treat that error as OK and proceed with checking whether the master has been promoted. This'll help in other retry scenarios as well. http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc@187 PS3, Line 187: return Status::NotFound(Substitute("New master $0 not found. Retry adding the master", : hp.ToString())); > Can it ever happen if we are past the line 139 where AddMaster returned suc One case I can think of is if user happens to remove the newly added master in parallel before ListMasters call. Also if the master is not found further checking is not possible. -- To view, visit http://gerrit.cloudera.org:8080/16530 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc Gerrit-Change-Number: 16530 Gerrit-PatchSet: 4 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 05 Feb 2021 22:54:50 +0000 Gerrit-HasComments: Yes
