Hello David Ribeiro Alves, Mike Percy, Adar Dembo,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/8037
to review the following change.
Change subject: KUDU-1788. Increase Raft RPC timeout to 30sec to avoid
fruitless retries.
......................................................................
KUDU-1788. Increase Raft RPC timeout to 30sec to avoid fruitless retries.
The Raft leader's behavior on a timeout is to simply retry the request,
potentially aggregating more data into the new attempt if new data is
waiting in the queue.
However, as described in the JIRA, this behavior is counterproductive in
the case that the network pipe or associated reactor thread is
saturated. The original request may be in the middle of transmission
already, and so the retry ends up re-sending bytes which have already
been sent, increasing "throughput" but not increasing "goodput".
The original Raft timeout was set to 1 second mainly due to KUDU-699, an
old bug in which the leader would block waiting on outstanding requests
to followers before it would step down. That was fixed quite a long time
back, though, so there is no longer any good reason to have such a short
timeout on a Raft request.
This patch bumps the default timeout to 30 seconds. I tested this on a
8-node cluster by using iptables to inject 1% packet loss on all nodes
and running an insertion workload as described in the JIRA. Without the
patch, if I did a 'kill -STOP' of a node and waited a couple seconds
before allowing it to continue, I would see that node log "deduplicated
request" messages for 30-60 seconds before it eventually caught up.
During that time, the tablet was effectively using only two replicas,
causing increased latency, etc.
With the higher timeout, I didn't see these messages, and the unpaused
replica caught up much more quickly.
Change-Id: I5f47dc006dc3dfb1659a224172e1905b6bf3d2a4
---
M src/kudu/consensus/consensus_peers.cc
1 file changed, 1 insertion(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/8037/1
--
To view, visit http://gerrit.cloudera.org:8080/8037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5f47dc006dc3dfb1659a224172e1905b6bf3d2a4
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>