On Sun, Aug 15, 2021 at 5:37 AM Vladislav Odintsov <odiv...@gmail.com>
wrote:
>
> Han, thanks for the answer.
> My comments inline.
>
> Regards,
> Vladislav Odintsov
>
> On 12 Aug 2021, at 19:15, Han Zhou <hz...@ovn.org> wrote:
>
>
>
> On Thu, Aug 12, 2021 at 4:54 AM Vladislav Odintsov <odiv...@gmail.com>
wrote:
> >
> > Hi,
> >
> > I’ve got a 3-node RAFT cluster, serving ovn northbound DB and trying to
understand what triggered ovsdb-server leadership change.
> > Can somebody help explain that?
> >
> > This cluster runs ovs 2.13.4 and has ~25 active clients: 3 x
ovn-northd, 3 x ovn-ic, 11 python-ovsdbapp clients (CMS).
> >
> > Server logs (per-node) listed above.
> >
> > Node 1:
> >
> > 2021-08-12T07:45:55.189Z|03131|raft|INFO|server be61 is leader for term
357
> > 2021-08-12T07:46:04.187Z|03132|raft|INFO|term 358: 1454 ms timeout
expired, starting election
> > 2021-08-12T07:46:04.191Z|03133|raft|INFO|term 358: elected leader by 2+
of 3 servers
> > 2021-08-12T07:46:05.558Z|03134|raft|INFO|rejecting term 357 < current
term 358 received in vote_reply message from server be61
> > 2021-08-12T07:46:29.181Z|03135|timeval|WARN|Unreasonably long 2122ms
poll interval (2120ms user, 0ms system)
> > 2021-08-12T07:46:29.181Z|03136|timeval|WARN|context switches: 0
voluntary, 7 involuntary
> > 2021-08-12T07:46:29.181Z|03137|coverage|INFO|Event coverage, avg rate
over last: 5 seconds, last minute, last hour,  hash=3b99bba2:
> > 2021-08-12T07:46:29.181Z|03138|coverage|INFO|hmap_pathological
94.0/sec     9.917/sec        0.1819/sec   total: 85714
> > 2021-08-12T07:46:29.181Z|03139|coverage|INFO|hmap_expand
 39281.4/sec  4318.283/sec      112.9786/sec   total: 193444648
> > 2021-08-12T07:46:29.181Z|03140|coverage|INFO|lockfile_lock
 0.0/sec     0.000/sec        0.0000/sec   total: 1
> > 2021-08-12T07:46:29.181Z|03141|coverage|INFO|poll_create_node
201.6/sec   109.617/sec       49.4042/sec   total: 425254534
> > 2021-08-12T07:46:29.181Z|03142|coverage|INFO|poll_zero_timeout
 2.0/sec     1.567/sec        0.9350/sec   total: 2938208
> > 2021-08-12T07:46:29.181Z|03143|coverage|INFO|seq_change
 13.0/sec    10.433/sec        5.5928/sec   total: 22320268
> > 2021-08-12T07:46:29.181Z|03144|coverage|INFO|pstream_open
0.0/sec     0.000/sec        0.0000/sec   total: 4
> > 2021-08-12T07:46:29.181Z|03145|coverage|INFO|stream_open
 0.0/sec     0.000/sec        0.0000/sec   total: 20
> > 2021-08-12T07:46:29.181Z|03146|coverage|INFO|unixctl_received
0.0/sec     0.017/sec        0.0164/sec   total: 54512
> > 2021-08-12T07:46:29.181Z|03147|coverage|INFO|unixctl_replied
 0.0/sec     0.017/sec        0.0164/sec   total: 54512
> > 2021-08-12T07:46:29.181Z|03148|coverage|INFO|util_xalloc
 1693863.2/sec 181037.517/sec     3963.9467/sec   total: 4830747405
> > 2021-08-12T07:46:29.181Z|03149|coverage|INFO|97 events never hit
> > 2021-08-12T07:46:29.186Z|03150|raft|WARN|ignoring vote request received
as leader
> > 2021-08-12T07:46:29.186Z|03151|raft|INFO|server be61 is leader for term
359
> > 2021-08-12T07:46:29.187Z|03152|raft|INFO|1076 truncating 1 entries from
end of log
> > 2021-08-12T07:46:29.187Z|03153|raft|INFO|rejected append_reply (not
leader)
> > 2021-08-12T07:46:29.187Z|03154|raft|INFO|rejected append_reply (not
leader)
> > 2021-08-12T07:46:29.187Z|03155|raft|INFO|rejected append_reply (not
leader)
> > 2021-08-12T07:46:29.191Z|03156|raft|INFO|rejected append_reply (not
leader)
> > 2021-08-12T07:46:29.221Z|03157|jsonrpc|WARN|Dropped 4 log messages in
last 14866 seconds (most recently, 14865 seconds ago) due to excessive rate
> > 2021-08-12T07:46:29.221Z|03158|jsonrpc|WARN|tcp:client-1:42402: receive
error: Connection reset by peer
> > 2021-08-12T07:46:29.222Z|03159|reconnect|WARN|tcp:client-1:42402:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:29.222Z|03160|jsonrpc|WARN|tcp:client-2:45746: receive
error: Connection reset by peer
> > 2021-08-12T07:46:29.222Z|03161|reconnect|WARN|tcp:client-2:45746:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:29.225Z|03162|jsonrpc|WARN|tcp:client-3:54218: receive
error: Connection reset by peer
> > 2021-08-12T07:46:29.225Z|03163|reconnect|WARN|tcp:client-3:54218:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:29.232Z|03164|jsonrpc|WARN|tcp:client-4:48064: receive
error: Connection reset by peer
> > 2021-08-12T07:46:29.232Z|03165|reconnect|WARN|tcp:client-4:48064:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:29.232Z|03166|jsonrpc|WARN|tcp:client-5:49022: receive
error: Connection reset by peer
> > 2021-08-12T07:46:29.232Z|03167|reconnect|WARN|tcp:client-5:49022:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:29.566Z|03168|reconnect|WARN|tcp:node-3:40902:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:30.047Z|03169|poll_loop|INFO|Dropped 64 log messages
in last 14879 seconds (most recently, 14876 seconds ago) due to excessive
rate
> > 2021-08-12T07:46:30.047Z|03170|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 49 (node-1:6641<->ip7:48658) at lib/stream-fd.c:153 (72% CPU usage)
> > 2021-08-12T07:46:30.195Z|03171|poll_loop|INFO|wakeup due to [POLLIN] on
fd 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage)
> > 2021-08-12T07:46:30.351Z|03172|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 43 (node-1:6641<->ip8:48656) at lib/stream-fd.c:153 (72% CPU usage)
> > 2021-08-12T07:46:30.529Z|03173|poll_loop|INFO|wakeup due to [POLLIN] on
fd 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage)
> > 2021-08-12T07:46:30.753Z|03174|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 39 (node-1:6641<->ip9:48062) at lib/stream-fd.c:153 (72% CPU usage)
> > 2021-08-12T07:46:30.864Z|03175|poll_loop|INFO|wakeup due to [POLLIN] on
fd 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage)
> > 2021-08-12T07:46:30.980Z|03176|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 49 (node-1:6641<->ip10:48658) at lib/stream-fd.c:153 (72% CPU usage)
> > 2021-08-12T07:46:31.156Z|03177|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 45 (node-1:6641<->ip11:54216) at lib/stream-fd.c:153 (72% CPU usage)
> > 2021-08-12T07:46:31.197Z|03178|poll_loop|INFO|wakeup due to [POLLIN] on
fd 20 (node-1:6643<->node-3:47488) at lib/stream-fd.c:157 (72% CPU usage)
> > 2021-08-12T07:46:31.198Z|03179|poll_loop|INFO|wakeup due to 0-ms
timeout at lib/reconnect.c:664 (72% CPU usage)
> > 2021-08-12T07:46:46.654Z|03180|raft|INFO|server 9f6e is leader for term
360
> > 2021-08-12T07:46:46.954Z|03181|raft|INFO|rejecting term 359 < current
term 360 received in append_request message from server be61
> >
> > Node 2:
> >
> > 2021-08-12T07:45:54.748Z|02385|raft|INFO|Transferring leadership to
write a snapshot.
> > 2021-08-12T07:45:55.058Z|02386|raft|INFO|server be61 is leader for term
357
> > 2021-08-12T07:45:55.185Z|02387|raft|INFO|rejecting term 356 < current
term 357 received in append_reply message from server 1076
> > 2021-08-12T07:46:04.192Z|02388|raft|INFO|server 1076 is leader for term
358
> > 2021-08-12T07:46:28.339Z|02389|raft|INFO|server be61 is leader for term
359
> > 2021-08-12T07:46:29.177Z|02390|raft|INFO|rejecting term 358 < current
term 359 received in append_request message from server 1076
> > 2021-08-12T07:46:29.183Z|02391|raft|INFO|rejecting term 358 < current
term 359 received in append_request message from server 1076
> > 2021-08-12T07:46:46.650Z|02392|raft|INFO|term 360: 1425 ms timeout
expired, starting election
> > 2021-08-12T07:46:46.654Z|02393|raft|INFO|term 360: elected leader by 2+
of 3 servers
> > 2021-08-12T07:46:46.954Z|02394|raft|INFO|rejecting term 359 < current
term 360 received in append_request message from server be61
> >
> > Node 3:
> >
> > 2021-08-12T07:45:54.750Z|03526|raft|INFO|received leadership transfer
from 9f6e in term 356
> > 2021-08-12T07:45:54.750Z|03527|raft|INFO|term 357: starting election
> > 2021-08-12T07:45:55.057Z|03528|raft|INFO|term 357: elected leader by 2+
of 3 servers
> > 2021-08-12T07:46:05.558Z|03529|raft|WARN|ignoring vote request received
as leader
> > 2021-08-12T07:46:05.558Z|03530|raft|INFO|server 1076 is leader for term
358
> > 2021-08-12T07:46:05.558Z|03531|timeval|WARN|Unreasonably long 2735ms
poll interval (2733ms user, 0ms system)
> > 2021-08-12T07:46:05.558Z|03532|timeval|WARN|faults: 5583 minor, 0 major
> > 2021-08-12T07:46:05.558Z|03533|timeval|WARN|disk: 0 reads, 8 writes
> > 2021-08-12T07:46:05.558Z|03534|timeval|WARN|context switches: 0
voluntary, 19 involuntary
> > 2021-08-12T07:46:05.558Z|03535|coverage|INFO|Skipping details of
duplicate event coverage for hash=3b99bba2
> > 2021-08-12T07:46:05.559Z|03536|poll_loop|INFO|Dropped 59 log messages
in last 8160 seconds (most recently, 8157 seconds ago) due to excessive rate
> > 2021-08-12T07:46:05.559Z|03537|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 37 (node-3:6641<->client-1:49952) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03538|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 46 (node-3:6641<->client-2:39512) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03539|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 35 (node-3:6641<->client-3:35686) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03540|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 47 (node-3:6641<->client-4:49954) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03541|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 39 (node-3:6641<->client-5:33490) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03542|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 45 (node-3:6641<->client-6:41156) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03543|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 40 (node-3:6641<->client-7:40098) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03544|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 43 (node-3:6641<->client-8:38632) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03545|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 36 (node-3:6641<->client-9:34912) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.559Z|03546|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 41 (node-3:6641<->client-10:55638) at lib/stream-fd.c:153 (57% CPU
usage)
> > 2021-08-12T07:46:05.896Z|03547|jsonrpc|WARN|tcp:client-11:43772: send
error: Broken pipe
> > 2021-08-12T07:46:05.896Z|03548|reconnect|WARN|tcp:client-11:43772:
connection dropped (Broken pipe)
> > 2021-08-12T07:46:05.928Z|03549|jsonrpc|WARN|tcp:client-12:55640:
receive error: Connection reset by peer
> > 2021-08-12T07:46:05.929Z|03550|reconnect|WARN|tcp:client-12:55640:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:19.068Z|03551|jsonrpc|WARN|tcp:client-13:33490:
receive error: Connection reset by peer
> > 2021-08-12T07:46:19.069Z|03552|reconnect|WARN|tcp:client-13:33490:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:19.176Z|03553|jsonrpc|WARN|tcp:client-14:38632:
receive error: Connection reset by peer
> > 2021-08-12T07:46:19.177Z|03554|reconnect|WARN|tcp:client-14:38632:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:20.712Z|03555|jsonrpc|WARN|tcp:client-15:34912:
receive error: Connection reset by peer
> > 2021-08-12T07:46:20.713Z|03556|reconnect|WARN|tcp:client-15:34912:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:23.453Z|03557|jsonrpc|WARN|tcp:client-16:41158:
receive error: Connection reset by peer
> > 2021-08-12T07:46:23.453Z|03558|reconnect|WARN|tcp:client-16:41158:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:28.336Z|03559|raft|INFO|term 359: 1318 ms timeout
expired, starting election
> > 2021-08-12T07:46:28.339Z|03560|raft|INFO|term 359: elected leader by 2+
of 3 servers
> > 2021-08-12T07:46:29.178Z|03561|raft|INFO|rejecting term 358 < current
term 359 received in append_request message from server 1076
> > 2021-08-12T07:46:29.182Z|03562|raft|INFO|rejecting term 358 < current
term 359 received in append_request message from server 1076
> > 2021-08-12T07:46:29.187Z|03563|raft|INFO|rejecting term 358 < current
term 359 received in vote_reply message from server 1076
> > 2021-08-12T07:46:46.957Z|03564|timeval|WARN|Unreasonably long 1645ms
poll interval (1642ms user, 0ms system)
> > 2021-08-12T07:46:46.957Z|03565|timeval|WARN|faults: 163 minor, 0 major
> > 2021-08-12T07:46:46.957Z|03566|timeval|WARN|context switches: 0
voluntary, 22 involuntary
> > 2021-08-12T07:46:46.957Z|03567|coverage|INFO|Skipping details of
duplicate event coverage for hash=3b99bba2
> > 2021-08-12T07:46:47.233Z|03568|raft|INFO|rejected append_reply (not
leader)
> > 2021-08-12T07:46:47.233Z|03569|raft|WARN|ignoring vote request received
after only 3 ms (minimum election time is 1000 ms)
> > 2021-08-12T07:46:47.233Z|03570|raft|INFO|server 9f6e is leader for term
360
> > 2021-08-12T07:46:47.234Z|03571|raft|INFO|rejected append_reply (not
leader)
> > 2021-08-12T07:46:48.290Z|03572|poll_loop|INFO|Dropped 49 log messages
in last 43 seconds (most recently, 40 seconds ago) due to excessive rate
> > 2021-08-12T07:46:48.290Z|03573|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 43 (node-3:6641<->client-17:34970) at lib/stream-fd.c:153 (64% CPU
usage)
> > 2021-08-12T07:46:48.446Z|03574|poll_loop|INFO|wakeup due to [POLLIN] on
fd 30 (node-3:6643<->node-2:54684) at lib/stream-fd.c:157 (64% CPU usage)
> > 2021-08-12T07:46:48.647Z|03575|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 41 (node-3:6641<->client-18:38648) at lib/stream-fd.c:153 (64% CPU
usage)
> > 2021-08-12T07:46:48.780Z|03576|poll_loop|INFO|wakeup due to [POLLIN] on
fd 30 (node-3:6643<->node-2:54684) at lib/stream-fd.c:157 (64% CPU usage)
> > 2021-08-12T07:46:48.919Z|03577|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 41 (node-3:6641<->client-19:38648) at lib/stream-fd.c:153 (64% CPU
usage)
> > 2021-08-12T07:46:49.113Z|03578|poll_loop|INFO|wakeup due to [POLLIN] on
fd 30 (1node-32.20.1.125:6643<->node-2:54684) at lib/stream-fd.c:157 (64%
CPU usage)
> > 2021-08-12T07:46:49.296Z|03579|poll_loop|INFO|wakeup due to [POLLOUT]
on fd 41 (node-3:6641<->client-20:38648) at lib/stream-fd.c:153 (64% CPU
usage)
> > 2021-08-12T07:46:50.837Z|03580|jsonrpc|WARN|tcp:client-21:35700:
receive error: Connection reset by peer
> > 2021-08-12T07:46:50.837Z|03581|reconnect|WARN|tcp:client-21:35700:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:46:55.042Z|03582|jsonrpc|WARN|tcp:client-22:39840:
receive error: Connection reset by peer
> > 2021-08-12T07:46:55.043Z|03583|reconnect|WARN|tcp:client-22:39840:
connection dropped (Connection reset by peer)
> > 2021-08-12T07:47:01.015Z|03584|jsonrpc|WARN|tcp:client-23:55826:
receive error: Connection reset by peer
> > 2021-08-12T07:47:01.015Z|03585|reconnect|WARN|tcp:client-23:55826:
connection dropped (Connection reset by peer)
> >
> >
> > First leader ownership transfer (at 07:45:54) is understandable.
> > node-3 received leadership transfer from node-1 because node-1 was
prepearing to take a DB snapshot/compaction.
> >
> > But then at 07:46:04.187Z node-1 wrote:
> > 2021-08-12T07:46:04.187Z|03132|raft|INFO|term 358: 1454 ms timeout
expired, starting election
> > 2021-08-12T07:46:04.191Z|03133|raft|INFO|term 358: elected leader by 2+
of 3 servers
> >
> > I’m not sure, but I suppose node-1 didn’t get heartbet from leader
(node-3), started leader election and won it. Right?
> >
> > My question is: how to understand the reason why node-3 couldn’t send
raft heartbeat in time?
> >
> > Can this long poll be the indicator that server was not answering
heartbeats?
>
> Yes. But not sure what caused the long poll on node-3.
> Are these logs from NB-DB servers? (I guessed from the port 6641). Are
the NB-DB clients (CMS) using the "leader_only" option - meaning connecting
to RAFT leaders only? If so, when node-1 transfers leadship (because of
doing snapshot), all the clients to node-1 would reconnect to the new
leader (node-3), which could cause some extra load if node-3 just did
snapshot right before the leadership transfer and misses the last
transaction ID requested by the clients. Not sure if this is what happened,
but it is just one possibility.
>
> I'd recommend:
> 1) Try disable "leader_only" from the clients, so that when leadership
changes, they don't reconnect. The cost is that for each write operation
the cost is higher on the RAFT cluster. If the write is very heavy/frequent
in your cluster it is probably not good to do so.
>
> Thanks for the advice - we’ll try this options. Traffic shouldn’t be very
heavy.
>
> 2) Try increasing the election-timer (heart-beat). It looks your are
using the default timer. It can be increase by ovs-appctl -t ...
cluster/change-election-timer OVN_Northbound <new timer in ms>. The long
poll is 2 - 3 seconds, so at least setting the timer to 3 - 5 seconds. It
can be specified while creating the cluster as well.
>
> Am I right that if we switch clients to no_leader_only mode, the reason
for these long polls we be eliminated and we can leave default timer
settings (which you correctly pointed - 1 second).
>
It depends. If the long poll is not caused by clients reconnecting, you
might still encounter the problem even using no_leader_only mode. Please
set the timer according to your HA requirement (how fast should it detect
and react to a real failure) provided that it doesn't trigger a false alarm.

> 3) If occasional long latency is acceptable when the leader is doing
snapshot, it is better not transferring the leadership before snapshot, if
clients are connected using leader-only mode. However, it seems nothing can
be done from your side. I think we could add this option to OVSDB so that
leadership is not transferred due to snapshot. @Ilya Maximets what do you
think?
>
> Yeah, that could be a good point to have a choice, whether do snaphot
with transferring leadership or not.
>
> But from my side an additional question here: I’ve never seen follower
ovsdb-servers taking snapshot/compaction.
> Moreover, on a Southbound DB, which is much more write-intensive, than
NBDB, I’ve seen many times that leader preparing to take a snapshot,
transfers leadership, does a snap. And after some seconds, or maybe in that
particular second, when follower node became a leader, this new leader also
wants to take a snap! And it starts transferring a leader.
>
> Please correct me, if follower also takes snap and it was just a
coincidence. On not?

Followers take snap as well, not really different from the leader in this
regard. They don't have to take snap at the same time, but it could happen
coincidentally.

> PS: I’m talking about OVS 2.13.
>
>
> Thanks,
> Han
>
> >
> > 2021-08-12T07:46:05.558Z|03531|timeval|WARN|Unreasonably long 2735ms
poll interval (2733ms user, 0ms system)
> > 2021-08-12T07:46:05.558Z|03532|timeval|WARN|faults: 5583 minor, 0 major
> > 2021-08-12T07:46:05.558Z|03533|timeval|WARN|disk: 0 reads, 8 writes
> > 2021-08-12T07:46:05.558Z|03534|timeval|WARN|context switches: 0
voluntary, 19 involuntary
> >
> > Thanks in advance for your help.
> >
> > Regards,
> > Vladislav Odintsov
> >
> > _______________________________________________
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to