Adar Dembo has posted comments on this change.

Change subject: c++ client: use operation timeout as deadline for finding new 
leader master
......................................................................


Patch Set 2:

> Seems reasonable enough. The only potential concern is that I sort
 > of recall picking the 'default RPC timeout' rather than the
 > 'operation timeout' so that, if the master was actually down, the
 > user would get an error quicker than their timeout, rather than a
 > bunch of retries. If you try to contact a cluster which is just
 > down, does it now hang for the full timeout? Or if we get
 > 'connection refused' from all masters, does it bail out relatively
 > quickly?
 
For more background reading check out commit 2c8aa9e.

I expect it'll hang for the full operation timeout. That's 30s by default, vs. 
10s for the RPC timeout.

 > I suppose an argument could be made either way, but 'fast fail'
 > seems to make sense at least for command-line interactive things if
 > the cluster is actually down (not just a transient restart/failure)

The root problem I'm trying to address is: very short RPC timeouts in 
ClientStressTest_MultiMaster_TestLeaderResolutionTimeout means the client 
doesn't wait long enough for a master leader election to finish when building 
the client. Can you think of a way to address that without hacking up the 
client itself?

-- 
To view, visit http://gerrit.cloudera.org:8080/3718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: No

Reply via email to