Adar Dembo has posted comments on this change.
Change subject: c++ client: use operation timeout as deadline for finding new
Patch Set 2:
> to work around in that particular test, why not explicitly wait for
> a leader election on the masters?
We could, but that's not 100% robust. An election can happen at any time, and
while we'll tolerate that later on in the test when we're writing and timed out
master RPCs are non-fatal, we won't in the beginning when we're trying to build
the client. Unfortunately (for me, I guess), I would like to eliminate all
sources of flakiness that I can.
> I think the user experience is
> not so great to say that if your cluster is down you have to wait
> 60 seconds to get an error (even though you may be willing to wait
> 60 seconds if you seem to be making some progress).
> Perhaps we can "early out" in the case that you get NetworkError
> from _all_ of the potential masters?
There's definitely an argument to be made for considering all of the responses
in aggregate when making decisions (right now decisions are made based on the
last response's status), but I don't think it's this. How do we differentiate
between "the cluster is down for good" and "the cluster is down momentarily"? I
think the only way to be faithful to the user's wishes is to adhere to the
Another option is to introduce a third client-level timeout (alongside "default
operation" and "default RPC") to be used solely for discovering the leader
master. For this test, it'd be enough to keep it at its default value. But,
it's more cognitive load for everyone else, so I've been reticent to go down
To view, visit http://gerrit.cloudera.org:8080/3718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>