Hi all,

I have an OVN central in cluster mode using raft and CMS as Openstack. I'm
noticing since the past days the following log regarding of leadership
transfer due the task to take snapshot:

cat /var/log/ovn/ovsdb-server-sb.log | grep snapshot | tail -f
2022-05-05T16:37:54.957Z|19676|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T16:55:53.488Z|20019|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T17:11:23.965Z|20627|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T17:25:39.876Z|21024|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T17:36:45.732Z|21271|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T17:56:16.830Z|21429|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T18:09:07.239Z|21793|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T18:24:11.689Z|22140|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T18:42:56.325Z|22510|raft|INFO|Transferring leadership to write a
snapshot.
2022-05-05T18:55:15.134Z|22752|raft|INFO|Transferring leadership to write a
snapshot.

When this is occurring all my ovn-controllers (KVMs) are reconnecting to
the new leader and into the older leader I have they (ovn-controller)
disconnecting as well:

2022-05-05T19:06:06.750Z|22876|reconnect|WARN|tcp:10.2X.4X.5:55026:
connection dropped (Connection reset by peer)
2022-05-05T19:06:07.054Z|22877|jsonrpc|WARN|tcp:10.2X.4X.133:39614: receive
error: Connection reset by peer
2022-05-05T19:06:07.054Z|22878|reconnect|WARN|tcp:10.2X.4X.133:39614:
connection dropped (Connection reset by peer)
2022-05-05T19:06:07.169Z|22879|jsonrpc|WARN|tcp:10.2X.4X.133:39652: receive
error: Connection reset by peer
2022-05-05T19:06:07.169Z|22880|reconnect|WARN|tcp:10.2X.4X.133:39652:
connection dropped (Connection reset by peer)
2022-05-05T19:06:07.192Z|22881|jsonrpc|WARN|tcp:10.2X.4X.69:34654: receive
error: Connection reset by peer

In my OVN leader I have the following cluster status:

ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
4a03
Name: OVN_Southbound
Cluster ID: ca74 (ca744caf-40cd-4751-a2f2-86e35ad6541c)
Server ID: 4a03 (4a0328dc-e9a4-495e-a4f1-0a0340fc6d19)
Address: tcp:10.2X.4X.132:6644
Status: cluster member
Role: leader
Term: 1706
Leader: self
Vote: self

Election timer: 10000
Log: [413129, 413212]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->3d6c ->4ef0 <-3d6c <-4ef0
Servers:
    4a03 (4a03 at tcp:10.2X.4X.132:6644) (self) next_index=413197
match_index=413211
    3d6c (3d6c at tcp:10.2X.4X.68:6644) next_index=413212 match_index=413211
    4ef0 (4ef0 at tcp:10.2X.4X.4:6644) next_index=413212 match_index=413211

On both North/Southbound the inactivity_probe is configured as 180 secs.
And on the chassis I have the following timers configured:
ovn-openflow-probe-interval=0
ovn-monitor-all=true
ovn-remote-probe-interval=180000

This setup is using Ubuntu 20.04, ovn-central 20.03.2 and Openstack Ussuri.
As you can see above, this is very frequent. Do you know what could be the
reason for it?

Thank you.

Regards,

Tiago Pires
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to