Han, thanks for the answer. My comments inline. Regards, Vladislav Odintsov
> On 12 Aug 2021, at 19:15, Han Zhou <[email protected]> wrote: > > > > On Thu, Aug 12, 2021 at 4:54 AM Vladislav Odintsov <[email protected] > <mailto:[email protected]>> wrote: > > > > Hi, > > > > I’ve got a 3-node RAFT cluster, serving ovn northbound DB and trying to > > understand what triggered ovsdb-server leadership change. > > Can somebody help explain that? > > > > This cluster runs ovs 2.13.4 and has ~25 active clients: 3 x ovn-northd, 3 > > x ovn-ic, 11 python-ovsdbapp clients (CMS). > > > > Server logs (per-node) listed above. > > > > Node 1: > > > > 2021-08-12T07:45:55.189Z|03131|raft|INFO|server be61 is leader for term 357 > > 2021-08-12T07:46:04.187Z|03132|raft|INFO|term 358: 1454 ms timeout expired, > > starting election > > 2021-08-12T07:46:04.191Z|03133|raft|INFO|term 358: elected leader by 2+ of > > 3 servers > > 2021-08-12T07:46:05.558Z|03134|raft|INFO|rejecting term 357 < current term > > 358 received in vote_reply message from server be61 > > 2021-08-12T07:46:29.181Z|03135|timeval|WARN|Unreasonably long 2122ms poll > > interval (2120ms user, 0ms system) > > 2021-08-12T07:46:29.181Z|03136|timeval|WARN|context switches: 0 voluntary, > > 7 involuntary > > 2021-08-12T07:46:29.181Z|03137|coverage|INFO|Event coverage, avg rate over > > last: 5 seconds, last minute, last hour, hash=3b99bba2: > > 2021-08-12T07:46:29.181Z|03138|coverage|INFO|hmap_pathological > > 94.0/sec 9.917/sec 0.1819/sec total: 85714 > > 2021-08-12T07:46:29.181Z|03139|coverage|INFO|hmap_expand > > 39281.4/sec 4318.283/sec 112.9786/sec total: 193444648 > > 2021-08-12T07:46:29.181Z|03140|coverage|INFO|lockfile_lock > > 0.0/sec 0.000/sec 0.0000/sec total: 1 > > 2021-08-12T07:46:29.181Z|03141|coverage|INFO|poll_create_node > > 201.6/sec 109.617/sec 49.4042/sec total: 425254534 > > 2021-08-12T07:46:29.181Z|03142|coverage|INFO|poll_zero_timeout > > 2.0/sec 1.567/sec 0.9350/sec total: 2938208 > > 2021-08-12T07:46:29.181Z|03143|coverage|INFO|seq_change > > 13.0/sec 10.433/sec 5.5928/sec total: 22320268 > > 2021-08-12T07:46:29.181Z|03144|coverage|INFO|pstream_open > > 0.0/sec 0.000/sec 0.0000/sec total: 4 > > 2021-08-12T07:46:29.181Z|03145|coverage|INFO|stream_open > > 0.0/sec 0.000/sec 0.0000/sec total: 20 > > 2021-08-12T07:46:29.181Z|03146|coverage|INFO|unixctl_received > > 0.0/sec 0.017/sec 0.0164/sec total: 54512 > > 2021-08-12T07:46:29.181Z|03147|coverage|INFO|unixctl_replied > > 0.0/sec 0.017/sec 0.0164/sec total: 54512 > > 2021-08-12T07:46:29.181Z|03148|coverage|INFO|util_xalloc > > 1693863.2/sec 181037.517/sec 3963.9467/sec total: 4830747405 > > 2021-08-12T07:46:29.181Z|03149|coverage|INFO|97 events never hit > > 2021-08-12T07:46:29.186Z|03150|raft|WARN|ignoring vote request received as > > leader > > 2021-08-12T07:46:29.186Z|03151|raft|INFO|server be61 is leader for term 359 > > 2021-08-12T07:46:29.187Z|03152|raft|INFO|1076 truncating 1 entries from end > > of log > > 2021-08-12T07:46:29.187Z|03153|raft|INFO|rejected append_reply (not leader) > > 2021-08-12T07:46:29.187Z|03154|raft|INFO|rejected append_reply (not leader) > > 2021-08-12T07:46:29.187Z|03155|raft|INFO|rejected append_reply (not leader) > > 2021-08-12T07:46:29.191Z|03156|raft|INFO|rejected append_reply (not leader) > > 2021-08-12T07:46:29.221Z|03157|jsonrpc|WARN|Dropped 4 log messages in last > > 14866 seconds (most recently, 14865 seconds ago) due to excessive rate > > 2021-08-12T07:46:29.221Z|03158|jsonrpc|WARN|tcp:client-1:42402: receive > > error: Connection reset by peer > > 2021-08-12T07:46:29.222Z|03159|reconnect|WARN|tcp:client-1:42402: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:29.222Z|03160|jsonrpc|WARN|tcp:client-2:45746: receive > > error: Connection reset by peer > > 2021-08-12T07:46:29.222Z|03161|reconnect|WARN|tcp:client-2:45746: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:29.225Z|03162|jsonrpc|WARN|tcp:client-3:54218: receive > > error: Connection reset by peer > > 2021-08-12T07:46:29.225Z|03163|reconnect|WARN|tcp:client-3:54218: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:29.232Z|03164|jsonrpc|WARN|tcp:client-4:48064: receive > > error: Connection reset by peer > > 2021-08-12T07:46:29.232Z|03165|reconnect|WARN|tcp:client-4:48064: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:29.232Z|03166|jsonrpc|WARN|tcp:client-5:49022: receive > > error: Connection reset by peer > > 2021-08-12T07:46:29.232Z|03167|reconnect|WARN|tcp:client-5:49022: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:29.566Z|03168|reconnect|WARN|tcp:node-3:40902: connection > > dropped (Connection reset by peer) > > 2021-08-12T07:46:30.047Z|03169|poll_loop|INFO|Dropped 64 log messages in > > last 14879 seconds (most recently, 14876 seconds ago) due to excessive rate > > 2021-08-12T07:46:30.047Z|03170|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 49 (node-1:6641<->ip7:48658) at lib/stream-fd.c:153 (72% CPU usage) > > 2021-08-12T07:46:30.195Z|03171|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage) > > 2021-08-12T07:46:30.351Z|03172|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 43 (node-1:6641<->ip8:48656) at lib/stream-fd.c:153 (72% CPU usage) > > 2021-08-12T07:46:30.529Z|03173|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage) > > 2021-08-12T07:46:30.753Z|03174|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 39 (node-1:6641<->ip9:48062) at lib/stream-fd.c:153 (72% CPU usage) > > 2021-08-12T07:46:30.864Z|03175|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage) > > 2021-08-12T07:46:30.980Z|03176|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 49 (node-1:6641<->ip10:48658) at lib/stream-fd.c:153 (72% CPU usage) > > 2021-08-12T07:46:31.156Z|03177|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 45 (node-1:6641<->ip11:54216) at lib/stream-fd.c:153 (72% CPU usage) > > 2021-08-12T07:46:31.197Z|03178|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage) > > 2021-08-12T07:46:31.198Z|03179|poll_loop|INFO|wakeup due to 0-ms timeout at > > lib/reconnect.c:664 (72% CPU usage) > > 2021-08-12T07:46:46.654Z|03180|raft|INFO|server 9f6e is leader for term 360 > > 2021-08-12T07:46:46.954Z|03181|raft|INFO|rejecting term 359 < current term > > 360 received in append_request message from server be61 > > > > Node 2: > > > > 2021-08-12T07:45:54.748Z|02385|raft|INFO|Transferring leadership to write a > > snapshot. > > 2021-08-12T07:45:55.058Z|02386|raft|INFO|server be61 is leader for term 357 > > 2021-08-12T07:45:55.185Z|02387|raft|INFO|rejecting term 356 < current term > > 357 received in append_reply message from server 1076 > > 2021-08-12T07:46:04.192Z|02388|raft|INFO|server 1076 is leader for term 358 > > 2021-08-12T07:46:28.339Z|02389|raft|INFO|server be61 is leader for term 359 > > 2021-08-12T07:46:29.177Z|02390|raft|INFO|rejecting term 358 < current term > > 359 received in append_request message from server 1076 > > 2021-08-12T07:46:29.183Z|02391|raft|INFO|rejecting term 358 < current term > > 359 received in append_request message from server 1076 > > 2021-08-12T07:46:46.650Z|02392|raft|INFO|term 360: 1425 ms timeout expired, > > starting election > > 2021-08-12T07:46:46.654Z|02393|raft|INFO|term 360: elected leader by 2+ of > > 3 servers > > 2021-08-12T07:46:46.954Z|02394|raft|INFO|rejecting term 359 < current term > > 360 received in append_request message from server be61 > > > > Node 3: > > > > 2021-08-12T07:45:54.750Z|03526|raft|INFO|received leadership transfer from > > 9f6e in term 356 > > 2021-08-12T07:45:54.750Z|03527|raft|INFO|term 357: starting election > > 2021-08-12T07:45:55.057Z|03528|raft|INFO|term 357: elected leader by 2+ of > > 3 servers > > 2021-08-12T07:46:05.558Z|03529|raft|WARN|ignoring vote request received as > > leader > > 2021-08-12T07:46:05.558Z|03530|raft|INFO|server 1076 is leader for term 358 > > 2021-08-12T07:46:05.558Z|03531|timeval|WARN|Unreasonably long 2735ms poll > > interval (2733ms user, 0ms system) > > 2021-08-12T07:46:05.558Z|03532|timeval|WARN|faults: 5583 minor, 0 major > > 2021-08-12T07:46:05.558Z|03533|timeval|WARN|disk: 0 reads, 8 writes > > 2021-08-12T07:46:05.558Z|03534|timeval|WARN|context switches: 0 voluntary, > > 19 involuntary > > 2021-08-12T07:46:05.558Z|03535|coverage|INFO|Skipping details of duplicate > > event coverage for hash=3b99bba2 > > 2021-08-12T07:46:05.559Z|03536|poll_loop|INFO|Dropped 59 log messages in > > last 8160 seconds (most recently, 8157 seconds ago) due to excessive rate > > 2021-08-12T07:46:05.559Z|03537|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 37 (node-3:6641<->client-1:49952) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03538|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 46 (node-3:6641<->client-2:39512) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03539|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 35 (node-3:6641<->client-3:35686) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03540|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 47 (node-3:6641<->client-4:49954) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03541|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 39 (node-3:6641<->client-5:33490) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03542|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 45 (node-3:6641<->client-6:41156) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03543|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 40 (node-3:6641<->client-7:40098) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03544|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 43 (node-3:6641<->client-8:38632) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03545|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 36 (node-3:6641<->client-9:34912) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.559Z|03546|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 41 (node-3:6641<->client-10:55638) at lib/stream-fd.c:153 (57% CPU usage) > > 2021-08-12T07:46:05.896Z|03547|jsonrpc|WARN|tcp:client-11:43772: send > > error: Broken pipe > > 2021-08-12T07:46:05.896Z|03548|reconnect|WARN|tcp:client-11:43772: > > connection dropped (Broken pipe) > > 2021-08-12T07:46:05.928Z|03549|jsonrpc|WARN|tcp:client-12:55640: receive > > error: Connection reset by peer > > 2021-08-12T07:46:05.929Z|03550|reconnect|WARN|tcp:client-12:55640: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:19.068Z|03551|jsonrpc|WARN|tcp:client-13:33490: receive > > error: Connection reset by peer > > 2021-08-12T07:46:19.069Z|03552|reconnect|WARN|tcp:client-13:33490: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:19.176Z|03553|jsonrpc|WARN|tcp:client-14:38632: receive > > error: Connection reset by peer > > 2021-08-12T07:46:19.177Z|03554|reconnect|WARN|tcp:client-14:38632: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:20.712Z|03555|jsonrpc|WARN|tcp:client-15:34912: receive > > error: Connection reset by peer > > 2021-08-12T07:46:20.713Z|03556|reconnect|WARN|tcp:client-15:34912: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:23.453Z|03557|jsonrpc|WARN|tcp:client-16:41158: receive > > error: Connection reset by peer > > 2021-08-12T07:46:23.453Z|03558|reconnect|WARN|tcp:client-16:41158: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:28.336Z|03559|raft|INFO|term 359: 1318 ms timeout expired, > > starting election > > 2021-08-12T07:46:28.339Z|03560|raft|INFO|term 359: elected leader by 2+ of > > 3 servers > > 2021-08-12T07:46:29.178Z|03561|raft|INFO|rejecting term 358 < current term > > 359 received in append_request message from server 1076 > > 2021-08-12T07:46:29.182Z|03562|raft|INFO|rejecting term 358 < current term > > 359 received in append_request message from server 1076 > > 2021-08-12T07:46:29.187Z|03563|raft|INFO|rejecting term 358 < current term > > 359 received in vote_reply message from server 1076 > > 2021-08-12T07:46:46.957Z|03564|timeval|WARN|Unreasonably long 1645ms poll > > interval (1642ms user, 0ms system) > > 2021-08-12T07:46:46.957Z|03565|timeval|WARN|faults: 163 minor, 0 major > > 2021-08-12T07:46:46.957Z|03566|timeval|WARN|context switches: 0 voluntary, > > 22 involuntary > > 2021-08-12T07:46:46.957Z|03567|coverage|INFO|Skipping details of duplicate > > event coverage for hash=3b99bba2 > > 2021-08-12T07:46:47.233Z|03568|raft|INFO|rejected append_reply (not leader) > > 2021-08-12T07:46:47.233Z|03569|raft|WARN|ignoring vote request received > > after only 3 ms (minimum election time is 1000 ms) > > 2021-08-12T07:46:47.233Z|03570|raft|INFO|server 9f6e is leader for term 360 > > 2021-08-12T07:46:47.234Z|03571|raft|INFO|rejected append_reply (not leader) > > 2021-08-12T07:46:48.290Z|03572|poll_loop|INFO|Dropped 49 log messages in > > last 43 seconds (most recently, 40 seconds ago) due to excessive rate > > 2021-08-12T07:46:48.290Z|03573|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 43 (node-3:6641<->client-17:34970) at lib/stream-fd.c:153 (64% CPU usage) > > 2021-08-12T07:46:48.446Z|03574|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 30 (node-3:6643<->node-2:54684) at lib/stream-fd.c:157 (64% CPU usage) > > 2021-08-12T07:46:48.647Z|03575|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 41 (node-3:6641<->client-18:38648) at lib/stream-fd.c:153 (64% CPU usage) > > 2021-08-12T07:46:48.780Z|03576|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 30 (node-3:6643<->node-2:54684) at lib/stream-fd.c:157 (64% CPU usage) > > 2021-08-12T07:46:48.919Z|03577|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 41 (node-3:6641<->client-19:38648) at lib/stream-fd.c:153 (64% CPU usage) > > 2021-08-12T07:46:49.113Z|03578|poll_loop|INFO|wakeup due to [POLLIN] on fd > > 30 (1node-32.20.1.125:6643<->node-2:54684) at lib/stream-fd.c:157 (64% CPU > > usage) > > 2021-08-12T07:46:49.296Z|03579|poll_loop|INFO|wakeup due to [POLLOUT] on fd > > 41 (node-3:6641<->client-20:38648) at lib/stream-fd.c:153 (64% CPU usage) > > 2021-08-12T07:46:50.837Z|03580|jsonrpc|WARN|tcp:client-21:35700: receive > > error: Connection reset by peer > > 2021-08-12T07:46:50.837Z|03581|reconnect|WARN|tcp:client-21:35700: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:46:55.042Z|03582|jsonrpc|WARN|tcp:client-22:39840: receive > > error: Connection reset by peer > > 2021-08-12T07:46:55.043Z|03583|reconnect|WARN|tcp:client-22:39840: > > connection dropped (Connection reset by peer) > > 2021-08-12T07:47:01.015Z|03584|jsonrpc|WARN|tcp:client-23:55826: receive > > error: Connection reset by peer > > 2021-08-12T07:47:01.015Z|03585|reconnect|WARN|tcp:client-23:55826: > > connection dropped (Connection reset by peer) > > > > > > First leader ownership transfer (at 07:45:54) is understandable. > > node-3 received leadership transfer from node-1 because node-1 was > > prepearing to take a DB snapshot/compaction. > > > > But then at 07:46:04.187Z node-1 wrote: > > 2021-08-12T07:46:04.187Z|03132|raft|INFO|term 358: 1454 ms timeout expired, > > starting election > > 2021-08-12T07:46:04.191Z|03133|raft|INFO|term 358: elected leader by 2+ of > > 3 servers > > > > I’m not sure, but I suppose node-1 didn’t get heartbet from leader > > (node-3), started leader election and won it. Right? > > > > My question is: how to understand the reason why node-3 couldn’t send raft > > heartbeat in time? > > > > Can this long poll be the indicator that server was not answering > > heartbeats? > > Yes. But not sure what caused the long poll on node-3. > Are these logs from NB-DB servers? (I guessed from the port 6641). Are the > NB-DB clients (CMS) using the "leader_only" option - meaning connecting to > RAFT leaders only? If so, when node-1 transfers leadship (because of doing > snapshot), all the clients to node-1 would reconnect to the new leader > (node-3), which could cause some extra load if node-3 just did snapshot right > before the leadership transfer and misses the last transaction ID requested > by the clients. Not sure if this is what happened, but it is just one > possibility. > > I'd recommend: > 1) Try disable "leader_only" from the clients, so that when leadership > changes, they don't reconnect. The cost is that for each write operation the > cost is higher on the RAFT cluster. If the write is very heavy/frequent in > your cluster it is probably not good to do so. Thanks for the advice - we’ll try this options. Traffic shouldn’t be very heavy. > 2) Try increasing the election-timer (heart-beat). It looks your are using > the default timer. It can be increase by ovs-appctl -t ... > cluster/change-election-timer OVN_Northbound <new timer in ms>. The long poll > is 2 - 3 seconds, so at least setting the timer to 3 - 5 seconds. It can be > specified while creating the cluster as well. Am I right that if we switch clients to no_leader_only mode, the reason for these long polls we be eliminated and we can leave default timer settings (which you correctly pointed - 1 second). > 3) If occasional long latency is acceptable when the leader is doing > snapshot, it is better not transferring the leadership before snapshot, if > clients are connected using leader-only mode. However, it seems nothing can > be done from your side. I think we could add this option to OVSDB so that > leadership is not transferred due to snapshot. @Ilya Maximets > <mailto:[email protected]> what do you think? Yeah, that could be a good point to have a choice, whether do snaphot with transferring leadership or not. But from my side an additional question here: I’ve never seen follower ovsdb-servers taking snapshot/compaction. Moreover, on a Southbound DB, which is much more write-intensive, than NBDB, I’ve seen many times that leader preparing to take a snapshot, transfers leadership, does a snap. And after some seconds, or maybe in that particular second, when follower node became a leader, this new leader also wants to take a snap! And it starts transferring a leader. Please correct me, if follower also takes snap and it was just a coincidence. On not? PS: I’m talking about OVS 2.13. > > Thanks, > Han > > > > > 2021-08-12T07:46:05.558Z|03531|timeval|WARN|Unreasonably long 2735ms poll > > interval (2733ms user, 0ms system) > > 2021-08-12T07:46:05.558Z|03532|timeval|WARN|faults: 5583 minor, 0 major > > 2021-08-12T07:46:05.558Z|03533|timeval|WARN|disk: 0 reads, 8 writes > > 2021-08-12T07:46:05.558Z|03534|timeval|WARN|context switches: 0 voluntary, > > 19 involuntary > > > > Thanks in advance for your help. > > > > Regards, > > Vladislav Odintsov > > > > _______________________________________________ > > discuss mailing list > > [email protected] <mailto:[email protected]> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > <https://mail.openvswitch.org/mailman/listinfo/ovs-discuss>_______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
