Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests ......................................................................
KUDU-699. consensus: Peer::Close() should not block on outstanding requests Prior to this patch, Peer::Close() would wait on a semaphore which was held by outbound RPC requests. This meant that, in the case of the default 1 second consensus RPC timeout, Close() would commonly block for a fairly long time. This blocking wasn't necessary for semantic correctness: even if the response came back with some kind of successful result, the result would be ignored because the leader was already in the process of stepping down. Instead, the blocking was an implementation detail: the outstanding RPC requests needed to keep alive the RPC request/response protobufs until they finished, and those protobufs are owned by the Peer object. In addition to being unnecessary, the blocking actually causes a couple serious issues: 1) in an overloaded cluster, we often see a lot of leader election churn and slow UpdateConsensus() calls. When UpdateConsensus() calls are slow, this would cause leader step-down to also be slow because of the PeerManager::Close() call taking a long time. This slow step-down process would hold an RPC thread, increasing the possibility of other RPCs getting rejected, retried, etc, contributing to the overall problems on the cluster. This is often visible in 'pstack' as 5-10 threads stuck in 'PeerManager::Close()' 2) KUDU-1788 describes an issue in which the short timeout we're currently using for consensus RPCs ends up resulting in those RPCs never succeeding, and wasting a lot of network bandwidth with repeated retries. Part of the solution to this issue is likely to involve boosting the timeout. With a longer RPC timeout, the blocking behavior on Close() described above is even more problematic. This patch fixes the issue as follows: 1) Peers now have shared ownership and inherit from std::enable_shared_from_this. When we make an RPC, we bind a shared_ptr to the Peer in the RPC's callback. This ensures that the Peer will not be destructed while an RPC is still in-flight. 2) We no longer need to use the 'sem_' variable to track whether an RPC is in flight. The purpose of this semaphore was two-fold: (a) to cause Close() to block, and (b) to prevent a second RPC from being sent while one was already in flight. The former purpose is no longer a goal. The latter purpose is attained just as easily using a simple boolean. So, this patch removes the semaphore and instead just uses a 'request_pending_' boolean. 3) While making these changes, I removed the 'state_' member and enum. The state was used to flag that Close() had been called, and to flag whether the first request had been sent yet. I replaced the state with two booleans, which I found simpler to reason about. A new test is included which sets the consensus RPC timeout to be long, stops, a follower, and then asks the leader to step down. Prior to this patch, the step-down would take as long as the consensus RPC timeout and cause the test to fail. With this patch, the step-down occurs immediately. Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0 Reviewed-on: http://gerrit.cloudera.org:8080/5490 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <[email protected]> --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/integration-tests/raft_consensus-itest.cc 6 files changed, 174 insertions(+), 124 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]>
