Hi all,

On Thu, Feb 13, 2020 at 8:09 AM Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Wed, Feb 12, 2020 at 9:57 AM Numan Siddique <nusid...@redhat.com>
> wrote:
> >
> > Hi Ben/All,
> >
> > In an OVN deployment - with OVN dbs deployed as active/standby using
> > pacemaker, we are seeing delays in response to unixctl command -
> > ovsdb-server/sync-status.
> >
> > Pacemaker periodically calls the OVN pacemaker OCF script to get the
> > status and this script internally invokes - ovs-appctl -t
> > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large
> > deployment with lots of OVN resources we see that ovsdb-server takes a
> > lot of time (sometimes > 60 seconds) to respond to this command. This
> > causes pacemaker to stop the service in that node and move the master
> > to another node. This causes a lot of disruption.
> >
> > One approach of solving this issue is to handle unixctl commands in a
> > separate thread. The commands like sync-status, get-** etc can be
> > easily handled in the thread. Still, there are many commands like
> > ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which
> > changes the state) which needs to be synchronized between the main
> > ovsdb-server thread and the newly added thread using a mutex.
> >
> > Does this approach makes sense ? I started working on it. But I wanted
> > to check with the community before putting into more efforts.
> >
> > Are there better ways to solve this issue ?
> >
> > Thanks
> > Numan
> >
> Hi Numan,
>
> It seems reasonable to me. Multi-threading would add a little complexity,
> but in this case it should be straightforward. It merely requires mutexes
> to synchronize between the threads for *writes*, and also for *reads* of
> non-atomic data.
> The only side effect is that *if* the thread that does the DB job really
> stucked because of a bug and not handling jobs at all, the unixctl thread
> ovsdb-server/sync-status command wouldn't detect it, so it could result in
> pacemaker reporting *happy* status without detecting problems. First for
> all this is unlikely to happen. But if we really think it is a problem we
> can still solve it by incrementing a counter in main loop and have a new
> command (readonly, without mutex) to check if this counter is increasing,
> to tell if the server if really working.
>

I'd be more inclined to do what Han suggests here and that every thread
contributes to the health status with a readonly counter.

Whatever gets implemented here perhaps can be re-used in ovn-controller to
monitor the main & pinctrl threads.
Similar scenario but maybe worse consequences as it affects dataplane is
that the "health" thread reports good status but the pinctrl thread is
stuck and therefore DHCP service is down and instances can't fetch IP.


> Thanks,
> Han
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to