Todd Lipcon has posted comments on this change. Change subject: KUDU-1788. Increase Raft RPC timeout to 30sec to avoid fruitless retries. ......................................................................
Patch Set 1: (2 comments) > does this have any nasty interaction with failure detection for other > situations than the one described in the jira. It seems like it might > potentially have since the previous timeout was less than 3x heartbeats and > the new one isn't. I don't think so. Let's look at the case where the network is slow such that each batch takes 1.1sec to transmit. In the old code, we'd have: t leader follower ------------------------------------ 0.0 send 1 1.0 timeout 1.01 send 2 <queued> 1.1 send 2 <on wire> received 1 2.01 timeout 2.02 send 3 <queued> 2.21 send 3 <on wire> received 2 ... i.e even though the leader was sending a new request once every 1 second due to the timeout, the follower receives the requests only at the rate of one every 1.1sec because of the transmission bottleneck. With the new code, the leader will just wait the full 1.1sec before sending the new request. So, the follower will still be receiving requests once every 1.1sec, but now all of the wire bandwidth will be used for "goodput" instead of each request getting successively longer and longer until reaching the max-batch-size. It's true that if the network is slow enough that the transmission of a single batch takes more than the failure time (3x heartbeat) then the follower will be triggering pre-elections in between each request. But I think that was already the case before, and if anything this should probably reduce the frequency of it occurring, again by reducing wasted network traffic. To avoid this I suppose we could have a dynamic transmission size which gets smaller on timeouts, or some out-of-band "liveness only" heartbeat sent via UDP, or somesuch. But all of those seem like incremental work that we could do on top of this fix. http://gerrit.cloudera.org:8080/#/c/8037/1//COMMIT_MSG Commit Message: PS1, Line 32: causing increased latency > That part is less intuitive to me, where's the slowdown coming from? The no just that when you only have two replicas, you are always waiting max(latency) across the two. When you have three replicas, you are taking median(latency) across the three. So naturally the latency with two replicas is almost always going to be worse than the latency with three replicas, if the replicas' individual latencies are all drawn from the same distribution (i.e it would not work this way if one of the replicas is consistently slower or farther away). http://gerrit.cloudera.org:8080/#/c/8037/1/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: Line 51: DEFINE_int32(consensus_rpc_timeout_ms, 30000, > This doesn't appear to prevent stacking, does it? I don't see a mechanism f Peer::SignalRequest has this code: // Only allow one request at a time. if (request_pending_) { return; } -- To view, visit http://gerrit.cloudera.org:8080/8037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f47dc006dc3dfb1659a224172e1905b6bf3d2a4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
