On Tue, May 26, 2020 at 10:50 AM Ilya Maximets <[email protected]> wrote: > > On 5/26/20 1:54 AM, Han Zhou wrote: > > > > > > On Mon, May 25, 2020 at 6:11 AM Ilya Maximets <[email protected] <mailto:[email protected]>> wrote: > >> > >> On 5/23/20 8:36 PM, Han Zhou wrote: > >> > > >> > > >> > On Sat, May 23, 2020 at 10:34 AM Ilya Maximets <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto: [email protected]>>> wrote: > >> >> > >> >> Snapshots are huge. In some cases we could receive several outdated > >> >> append replies from the remote server. This could happen in high > >> >> scale cases if the remote server is overloaded and not able to process > >> >> all the raft requests in time. As an action to each outdated append > >> >> reply we're sending full database snapshot. While remote server is > >> >> already overloaded those snapshots will stuck in jsonrpc backlog for > >> >> a long time making it grow up to few GB. Since remote server wasn't > >> >> able to timely process incoming messages it will likely not able to > >> >> process snapshots leading to the same situation with low chances to > >> >> recover. Remote server will likely stuck in 'candidate' state, other > >> >> servers will grow their memory consumption due to growing jsonrpc > >> >> backlogs: > >> > > >> > Hi Ilya, this patch LGTM. Just not not clear about this last part of the commit message. Why would remote server stuck in 'candidate' state if there are pending messages from leader for it to handle? If the follower was busy processing older messages, it wouldn't have had a chance to see election timer timeout without receiving heartbeat from leader, so it shouldn't try to start voting, right? > >> > >> I'm not sure what exactly happens, but that is what I see in my setup. > >> Overloaded server sends vote requests almost each second with the term > >> increased by 1 each time. I think it doesn't see heartbeats since it > >> processes only few messages at a time and a single message processing > >> like applying the snapshot could lead to election timer expiration. > >> > > It processes at most 50 messages at a time for each connection in raft_conn_run(), which should guarantee append_request (heartbeat) is seen. However, it is possible that the connection is lost due to inactivity probe, then append_request could be missed, causing re-election. Did you see such case after applying the patch that disables inactivity probe for raft connections? > > Yes, I tested with inactivity probe disabled. > > Excessive send backlog doesn't mean that we always have something to receive > on the other side. jsonrpc backlog is stored on a sender side and each time > sender calls jsonrpc_run() one message from that backlog pushed to stream_send(). > In our case steam-ssl buffers this one or even part of this one message for > sending. And only that one chunk of data could be continuously received on > the other side without additional actions from the sender. To receive more > data on receiver side, sender should call jsonrpc_run() --> stream_ssl_run() > again. So, we're not always receiving 50 messages during a single > raft_conn_run(). In practice, we're receiving only few of them, i.e. might > easily skip some appends or heartbeats and start voting. > > I'm not 100% sure that this is what really happens, but it seems possible. > > Also, old messages with stale term doesn't reset election timeout. This might > contribute to the issue as well. > > What do you think? >
Thanks for the explain. I wasn't aware of that stream_send() behavior. It makes sense! > > > >> > >> > Otherwise: > >> > > >> > Acked-by: Han Zhou <[email protected] <mailto:[email protected]> <mailto: [email protected] <mailto:[email protected]>>> > >> > > >> >> > >> >> jsonrpc|INFO|excessive sending backlog, jsonrpc: ssl:192.16.0.3:6644 <http://192.16.0.3:6644> <http://192.16.0.3:6644>, > >> >> num of msgs: 3795, backlog: 8838994624. > >> >> > >> >> This patch is trying to avoid that situation by avoiding sending of > >> >> equal snapshot install requests. This helps maintain reasonable memory > >> >> consumption and allows the cluster to recover on a larger scale. > >> >> > >> >> Signed-off-by: Ilya Maximets <[email protected] <mailto: [email protected]> <mailto:[email protected] <mailto:[email protected]>>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
