Adar Dembo has submitted this change and it was merged.
Change subject: c++ client: use operation timeout as deadline for finding new
c++ client: use operation timeout as deadline for finding new leader master
We had been using the default RPC timeout, which may be set to a very low
value as in ClientStressTest_MultiMaster_TestLeaderResolutionTimeout.
Now we'll use the operation timeout as the overall deadline while still
preserving the semantics of using the default RPC timeout for the
GetMasterRegistration() RPCs themselves.
As my patch series removes the guarantee that a leader master is elected at
the time that cluster tests run, it's important that the logic for finding
the leader master provide ample time for an election to finish.
Also, I think I've addressed the root cause behind KUDU-573 by fixing a race
in GetLeaderMasterRpc's SendRpcCb() and GetMasterRegistrationRpcCbForNode()
methods. The race manifests when the last two RPC responses are "I am the
leader" and "I am not the leader" respectively. In one interleaving, both
responses enter SendRpcCb(), and the second calls DelayedRetryCb(). If that
were a call to DelayedRetry() instead, the GetLeaderMasterRpc would be
destroyed by the time the reactor thread reran the RPC.
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <t...@apache.org>
5 files changed, 63 insertions(+), 38 deletions(-)
Todd Lipcon: Looks good to me, approved
Kudu Jenkins: Verified
To view, visit http://gerrit.cloudera.org:8080/3718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>