On Tue, Aug 1, 2017 at 8:06 AM, Numan Siddique <[email protected]> wrote: > > > On Tue, Aug 1, 2017 at 4:25 PM, Numan Siddique <[email protected]> wrote: >> >> >> >> On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou <[email protected]> wrote: >>> >>> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant <[email protected]> wrote: >>> > >>> > I wanted to share the idea before I code it to see if it makes sense. >>> > I imagine the patch would be small, though. >>> > >>> > We currently provide HA for ovn-northd by using Pacemaker to ensure >>> > that ovn-northd is running only one time somewhere in a cluster. >>> > >> >> >> In the case of Pacemaker, the pacemaker OCF resource script >> (ovndb-servers.ocf) starts the ovn-northd on the master node if >> manage_northd is set to true and this is the approach tripleo has taken. >> ovn-northd uses the unix sockets to communicate to the NB and SB db servers. >> With the suggested approach, ovn-northd would be using tcp connection to >> communicate to the db servers ? > > > Given that SB DB ovsdb-server would assign the lock, it has to be a tcp > connection. Sorry for the confusion.
Right. If we're still using Pacemaker for the databases anyway, we could keep managing ovn-northd as well to ensure they stay local so we can keep using the local unix socket. On the other hand, separating them will spread load a bit more across controllers. It's not obvious to me which is better. > > >> >> Thanks >> Numan >> >> >> >> >>> > What if we made ovn-northd acquire an OVSDB lock on the southbound >>> > database before it did any real work? That way we could start >>> > multiple copies of ovn-northd in a cluster, but only one would be >>> > active at a time. >>> > >>> > This is crude, and obviously we would want to distribute work among >>> > ovn-northd instances eventually, but does this sound like an >>> > improvement over requiring something like Pacemaker? >>> > >>> Russell, this sounds very good. I think it is better than Pacemaker. I >>> would still call it active-standby though. The standby is hot-standby, >>> not >>> cold-standby. Thanks, Han. Yes, your "hot-standby" terminology sounds better. :-) >>> >>> It seems the change would be having the active one updating <timestamp, >>> northd hostname> periodically to southbound DB, so that other northd will >>> know if it is still alive, and otherwise taking over by updating the >>> <timestamp, northd hostname>. Is this correct? This wouldn't be required. If the active ovn-northd goes down, the OVSDB lock will automatically be released when the database detects that the connection is gone. We would be relying on ovsdb-server for how quickly the failover happens. I think this change would be an improvement. At a minimum, it ensures ovn-northd behaves in a sane way if you accidentally run it more than once in a deployment. It also allows intentionally running it more than once as one or more hot-standby instances. Maybe in the next release we can revisit proposals for fully active-active ovn-northd. -- Russell Bryant _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
