Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8037 )
Change subject: KUDU-1788. Increase Raft RPC timeout to 30sec to avoid fruitless retries. ...................................................................... KUDU-1788. Increase Raft RPC timeout to 30sec to avoid fruitless retries. The Raft leader's behavior on a timeout is to simply retry the request, potentially aggregating more data into the new attempt if new data is waiting in the queue. However, as described in the JIRA, this behavior is counterproductive in the case that the network pipe or associated reactor thread is saturated. The original request may be in the middle of transmission already, and so the retry ends up re-sending bytes which have already been sent, increasing "throughput" but not increasing "goodput". The original Raft timeout was set to 1 second mainly due to KUDU-699, an old bug in which the leader would block waiting on outstanding requests to followers before it would step down. That was fixed quite a long time back, though, so there is no longer any good reason to have such a short timeout on a Raft request. This patch bumps the default timeout to 30 seconds. I tested this on a 8-node cluster by using iptables to inject 1% packet loss on all nodes and running an insertion workload as described in the JIRA. Without the patch, if I did a 'kill -STOP' of a node and waited a couple seconds before allowing it to continue, I would see that node log "deduplicated request" messages for 30-60 seconds before it eventually caught up. During that time, the tablet was effectively using only two replicas, causing increased latency, etc. With the higher timeout, I didn't see these messages, and the unpaused replica caught up much more quickly. Change-Id: I5f47dc006dc3dfb1659a224172e1905b6bf3d2a4 Reviewed-on: http://gerrit.cloudera.org:8080/8037 Reviewed-by: David Ribeiro Alves <[email protected]> Reviewed-by: Mike Percy <[email protected]> Tested-by: Kudu Jenkins --- M src/kudu/consensus/consensus_peers.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: David Ribeiro Alves: Looks good to me, approved Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5f47dc006dc3dfb1659a224172e1905b6bf3d2a4 Gerrit-Change-Number: 8037 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]>
