Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd

Russell Bryant Tue, 01 Aug 2017 06:45:06 -0700

On Tue, Aug 1, 2017 at 8:06 AM, Numan Siddique <[email protected]> wrote:
>
>
> On Tue, Aug 1, 2017 at 4:25 PM, Numan Siddique <[email protected]> wrote:
>>
>>
>>
>> On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou <[email protected]> wrote:
>>>
>>> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant <[email protected]> wrote:
>>> >
>>> > I wanted to share the idea before I code it to see if it makes sense.
>>> > I imagine the patch would be small, though.
>>> >
>>> > We currently provide HA for ovn-northd by using Pacemaker to ensure
>>> > that ovn-northd is running only one time somewhere in a cluster.
>>> >
>>
>>
>> In the case of Pacemaker, the pacemaker OCF resource script
>> (ovndb-servers.ocf) starts the ovn-northd on the master node if
>> manage_northd is set to true and this is the approach tripleo has taken.
>> ovn-northd uses the unix sockets to communicate to the NB and SB db servers.
>> With the suggested approach, ovn-northd would be using tcp connection to
>> communicate to the db servers ?
>
>
> Given that SB DB ovsdb-server would assign the lock, it has to be a tcp
> connection. Sorry for the confusion.


Right.  If we're still using Pacemaker for the databases anyway, we
could keep managing ovn-northd as well to ensure they stay local so we
can keep using the local unix socket.  On the other hand, separating
them will spread load a bit more across controllers.  It's not obvious
to me which is better.

>
>
>>
>> Thanks
>> Numan
>>
>>
>>
>>
>>> > What if we made ovn-northd acquire an OVSDB lock on the southbound
>>> > database before it did any real work?  That way we could start
>>> > multiple copies of ovn-northd in a cluster, but only one would be
>>> > active at a time.
>>> >
>>> > This is crude, and obviously we would want to distribute work among
>>> > ovn-northd instances eventually, but does this sound like an
>>> > improvement over requiring something like Pacemaker?
>>> >
>>> Russell, this sounds very good. I think it is better than Pacemaker. I
>>> would still call it active-standby though. The standby is hot-standby,
>>> not
>>> cold-standby.

Thanks, Han.  Yes, your "hot-standby" terminology sounds better.  :-)

>>>
>>> It seems the change would be having the active one updating <timestamp,
>>> northd hostname> periodically to southbound DB, so that other northd will
>>> know if it is still alive, and otherwise taking over by updating the
>>> <timestamp, northd hostname>. Is this correct?

This wouldn't be required.  If the active ovn-northd goes down, the
OVSDB lock will automatically be released when the database detects
that the connection is gone.  We would be relying on ovsdb-server for
how quickly the failover happens.

I think this change would be an improvement.  At a minimum, it ensures
ovn-northd behaves in a sane way if you accidentally run it more than
once in a deployment.  It also allows intentionally running it more
than once as one or more hot-standby instances.

Maybe in the next release we can revisit proposals for fully
active-active ovn-northd.

-- 
Russell Bryant
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd

Reply via email to