Todd Lipcon has posted comments on this change.

Change subject: KUDU-1788. Increase Raft RPC timeout to 30sec to avoid 
fruitless retries.
......................................................................


Patch Set 1:

(2 comments)

> does this have any nasty interaction with failure detection for other 
> situations than the one described in the jira. It seems like it might 
> potentially have since the previous timeout was less than 3x heartbeats and 
> the new one isn't.

I don't think so.

Let's look at the case where the network is slow such that each batch takes 
1.1sec to transmit. In the old code, we'd have:

t     leader            follower
------------------------------------
0.0   send 1
1.0   timeout
1.01  send 2 <queued>
1.1   send 2 <on wire>  received 1 
2.01  timeout
2.02  send 3 <queued>
2.21  send 3 <on wire>  received 2
...

i.e even though the leader was sending a new request once every 1 second due to 
the timeout, the follower receives the requests only at the rate of one every 
1.1sec because of the transmission bottleneck.

With the new code, the leader will just wait the full 1.1sec before sending the 
new request. So, the follower will still be receiving requests once every 
1.1sec, but now all of the wire bandwidth will be used for "goodput" instead of 
each request getting successively longer and longer until reaching the 
max-batch-size.

It's true that if the network is slow enough that the transmission of a single 
batch takes more than the failure time (3x heartbeat) then the follower will be 
triggering pre-elections in between each request. But I think that was already 
the case before, and if anything this should probably reduce the frequency of 
it occurring, again by reducing wasted network traffic. To avoid this I suppose 
we could have a dynamic transmission size which gets smaller on timeouts, or 
some out-of-band "liveness only" heartbeat sent via UDP, or somesuch. But all 
of those seem like incremental work that we could do on top of this fix.

http://gerrit.cloudera.org:8080/#/c/8037/1//COMMIT_MSG
Commit Message:

PS1, Line 32: causing increased latency
> That part is less intuitive to me, where's the slowdown coming from? The no
just that when you only have two replicas, you are always waiting max(latency) 
across the two. When you have three replicas, you are taking median(latency) 
across the three. So naturally the latency with two replicas is almost always 
going to be worse than the latency with three replicas, if the replicas' 
individual latencies are all drawn from the same distribution (i.e it would not 
work this way if one of the replicas is consistently slower or farther away).


http://gerrit.cloudera.org:8080/#/c/8037/1/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 51: DEFINE_int32(consensus_rpc_timeout_ms, 30000,
> This doesn't appear to prevent stacking, does it? I don't see a mechanism f
Peer::SignalRequest has this code:

  // Only allow one request at a time.
  if (request_pending_) {
    return;
  }


-- 
To view, visit http://gerrit.cloudera.org:8080/8037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f47dc006dc3dfb1659a224172e1905b6bf3d2a4
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to