Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16530 )

Change subject: [tool] KUDU-2181 CLI to add master
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16530/2/src/kudu/tools/tool_action_master.cc
File src/kudu/tools/tool_action_master.cc:

PS2:
> Cool -- I guess dynamic_multi_master-test.cc is a good place to test functi
I updated dynamic-multi-master_test.cc to use this CLI tool exclusively since 
we expect users to only use CLI and not invoke RPC proxy directly. This tool is 
more advanced and needs bunch of setup and verification steps, so it's better 
to keep all multi-master addition/removal tests in one place.


http://gerrit.cloudera.org:8080/#/c/16530/2/src/kudu/tools/tool_action_master.cc@196
PS2, Line 196: d master $0 to the cluster. Please fol
> Alternatively, consider the option of the tool printing the instructions.
That's an interesting idea but I'm afraid things like formatting can be tricky 
with printing instructions on console. So markdown/HTML is still better IMO for 
such documentation that involves multiple steps.


http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc
File src/kudu/tools/tool_action_master.cc:

http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc@62
PS3, Line 62: 4
> I'm curious: how did you come up with 4 seconds as the threshold?  Does it
I basically looked at default settings with Raft heartbeat interval at 500ms, 
system catalog WAL segment size of 8MB beyond which it'll be GC'ed and 1 MB 
batch size used between Raft peers. This gave 4 sec number.

This doesn't give enough buffer to the actual Raft change config which is 
instantaneous but could take little bit of time or whether network will be able 
to transfer 1MB in 500 msecs i.e. 2 MB/sec.

So I'll bump this up to 8 secs to provide extra margin.


http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc@159
PS3, Line 159:     RETURN_NOT_OK((proxy.SyncRpc<ListMastersRequestPB, 
ListMastersResponsePB>(
             :                    req, &resp, "ListMasters", 
&MasterServiceProxy::ListMastersAsync)));
             :
             :     if (resp.has_error()) {
             :       return StatusFromPB(resp.error().status());
             :     }
Transient error like leadership change will be handled by LeaderMasterProxy, 
like Andrew mentioned.

> Will it work as expected if re-running the tool?  Will calling AddMaster for 
> the second time with the same host:port succeed?

This is an interesting one. With the current code, AddMaster RPC will return 
"master already present" error. So instead of returning that error in CLI 
directly, I've updated code to now treat that error as OK and proceed with 
checking whether the master has been promoted. This'll help in other retry 
scenarios as well.


http://gerrit.cloudera.org:8080/#/c/16530/3/src/kudu/tools/tool_action_master.cc@187
PS3, Line 187:       return Status::NotFound(Substitute("New master $0 not 
found. Retry adding the master",
             :                                          hp.ToString()));
> Can it ever happen if we are past the line 139 where AddMaster returned suc
One case I can think of is if user happens to remove the newly added master in 
parallel before ListMasters call.
Also if the master is not found further checking is not possible.



--
To view, visit http://gerrit.cloudera.org:8080/16530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507f301d1aba17327eb35728eed0d765e86ef4cc
Gerrit-Change-Number: 16530
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 05 Feb 2021 22:54:50 +0000
Gerrit-HasComments: Yes

Reply via email to