On Wed, Aug 9, 2017 at 5:01 AM, Miguel Angel Ajo Pelayo <majop...@redhat.com> wrote: > Nice idea, I have btw some comments/thoughts/questions regarding this: > > 1) Does OVSDB have any heartbeat protocol? (to detect that one northd has > died even during inactive periods).
Yes, it does. By deafult, both ends of an OVSDB connection send a regular keepalive message every 5 seconds to help detect a dead connection. It's configurable, and the default has caused extra reconnects when doing performance testing and services get too busy to keep up with the default keepalive settings. > Otherwise we can document the need to tweak the tcp_keepalive settings > of the system to have some sensible settings that will make TCP detect the > connection failure in a reasonable amount of time. Indeed > 2) We need to consider that in some cases the master ovsdb server and the > northd process will be colocated and therefore fall together. I guess that > in that case the lock is replicated to the slave ovsdb server, we need to > make sure that the lock will be dropped once the old slave(backup) becomes > master. Good question. I didn't look into how this behaves with active/passive ovsdb-server. I assume all locks are dropped when services have to reconnect to the new master. We should test it. Also note that our Pacemaker config currently still manages ovn-northd, so we're not exclusively relying on this behavior. At a minimum, if we still let Pacemaker drive the HA (because we're running Pacemaker for ovsdb still anyway), this addition helps ensure that only a single ovn-northd is active if somehow more than one was accidentally started. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev