Adar Dembo has posted comments on this change.

Change subject: c++ client: use operation timeout as deadline for finding new 
leader master
......................................................................


Patch Set 2:

> to work around in that particular test, why not explicitly wait for
 > a leader election on the masters?

We could, but that's not 100% robust. An election can happen at any time, and 
while we'll tolerate that later on in the test when we're writing and timed out 
master RPCs are non-fatal, we won't in the beginning when we're trying to build 
the client. Unfortunately (for me, I guess), I would like to eliminate all 
sources of flakiness that I can.

 > I think the user experience is
 > not so great to say that if your cluster is down you have to wait
 > 60 seconds to get an error (even though you may be willing to wait
 > 60 seconds if you seem to be making some progress).
 > 
 > Perhaps we can "early out" in the case that you get NetworkError
 > from _all_ of the potential masters?

There's definitely an argument to be made for considering all of the responses 
in aggregate when making decisions (right now decisions are made based on the 
last response's status), but I don't think it's this. How do we differentiate 
between "the cluster is down for good" and "the cluster is down momentarily"? I 
think the only way to be faithful to the user's wishes is to adhere to the 
operation's deadline.

Another option is to introduce a third client-level timeout (alongside "default 
operation" and "default RPC") to be used solely for discovering the leader 
master. For this test, it'd be enough to keep it at its default value. But, 
it's more cognitive load for everyone else, so I've been reticent to go down 
that path.

-- 
To view, visit http://gerrit.cloudera.org:8080/3718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: No

Reply via email to