[kudu-CR] c++ client: use operation timeout as deadline for finding new leader master
Todd Lipcon has posted comments on this change. Change subject: c++ client: use operation timeout as deadline for finding new leader master .. Patch Set 2: Seems reasonable enough. The only potential concern is that I sort of recall picking the 'default RPC timeout' rather than the 'operation timeout' so that, if the master was actually down, the user would get an error quicker than their timeout, rather than a bunch of retries. If you try to contact a cluster which is just down, does it now hang for the full timeout? Or if we get 'connection refused' from all masters, does it bail out relatively quickly? I suppose an argument could be made either way, but 'fast fail' seems to make sense at least for command-line interactive things if the cluster is actually down (not just a transient restart/failure) -- To view, visit http://gerrit.cloudera.org:8080/3718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] c++ client: use operation timeout as deadline for finding new leader master
Kudu Jenkins has posted comments on this change. Change subject: c++ client: use operation timeout as deadline for finding new leader master .. Patch Set 2: -Verified Build Started http://104.196.14.100/job/kudu-gerrit/2609/ -- To view, visit http://gerrit.cloudera.org:8080/3718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] c++ client: use operation timeout as deadline for finding new leader master
Kudu Jenkins has posted comments on this change. Change subject: c++ client: use operation timeout as deadline for finding new leader master .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2603/ -- To view, visit http://gerrit.cloudera.org:8080/3718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] c++ client: use operation timeout as deadline for finding new leader master
Hello Dan Burkert, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3718 to review the following change. Change subject: c++ client: use operation timeout as deadline for finding new leader master .. c++ client: use operation timeout as deadline for finding new leader master We had been using the default RPC timeout, which may be set to a very low value as in ClientStressTest_MultiMaster_TestLeaderResolutionTimeout. Now we'll use the operation timeout as the overall deadline while still preserving the semantics of using the default RPC timeout for the GetMasterRegistration() RPCs themselves. As my patch series removes the guarantee that a leader master is elected at the time that cluster tests run, it's important that the logic for finding the leader master provide ample time for an election to finish. Also, I think I've addressed the root cause behind KUDU-573 by fixing a race in GetLeaderMasterRpc's SendRpcCb() and GetMasterRegistrationRpcCbForNode() methods. The race manifests when the last two RPC responses are "I am the leader" and "I am not the leader" respectively. In one interleaving, both responses enter SendRpcCb(), and the second calls DelayedRetryCb(). If that were a call to DelayedRetry() instead, the GetLeaderMasterRpc would be destroyed by the time the reactor thread reran the RPC. Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/master/master_rpc.cc M src/kudu/master/master_rpc.h M src/kudu/tserver/heartbeater.cc 5 files changed, 52 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/3718/1 -- To view, visit http://gerrit.cloudera.org:8080/3718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0d770875bbf4703444abac11dbc232d7e382165e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Todd Lipcon