Re: [ovs-discuss] OVN: Delay in handling unixctl commands in ovsdb-server

2020-02-13 Thread Ben Pfaff
On Wed, Feb 12, 2020 at 11:27:18PM +0530, Numan Siddique wrote:
> Hi Ben/All,
> 
> In an OVN deployment - with OVN dbs deployed as active/standby using
> pacemaker, we are seeing delays in response to unixctl command -
> ovsdb-server/sync-status.
> 
> Pacemaker periodically calls the OVN pacemaker OCF script to get the
> status and this script internally invokes - ovs-appctl -t
> /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large
> deployment with lots of OVN resources we see that ovsdb-server takes a
> lot of time (sometimes > 60 seconds) to respond to this command. This
> causes pacemaker to stop the service in that node and move the master
> to another node. This causes a lot of disruption.
> 
> One approach of solving this issue is to handle unixctl commands in a
> separate thread. The commands like sync-status, get-** etc can be
> easily handled in the thread. Still, there are many commands like
> ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which
> changes the state) which needs to be synchronized between the main
> ovsdb-server thread and the newly added thread using a mutex.
> 
> Does this approach makes sense ? I started working on it. But I wanted
> to check with the community before putting into more efforts.

It seems reasonable to me to support unixctl commands in multiple
threads.  The details of how you implement it will determine how usable
it is.  I suggest making the current case easy and common.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: Delay in handling unixctl commands in ovsdb-server

2020-02-13 Thread Daniel Alvarez Sanchez
Hi all,

On Thu, Feb 13, 2020 at 8:09 AM Han Zhou  wrote:

>
>
> On Wed, Feb 12, 2020 at 9:57 AM Numan Siddique 
> wrote:
> >
> > Hi Ben/All,
> >
> > In an OVN deployment - with OVN dbs deployed as active/standby using
> > pacemaker, we are seeing delays in response to unixctl command -
> > ovsdb-server/sync-status.
> >
> > Pacemaker periodically calls the OVN pacemaker OCF script to get the
> > status and this script internally invokes - ovs-appctl -t
> > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large
> > deployment with lots of OVN resources we see that ovsdb-server takes a
> > lot of time (sometimes > 60 seconds) to respond to this command. This
> > causes pacemaker to stop the service in that node and move the master
> > to another node. This causes a lot of disruption.
> >
> > One approach of solving this issue is to handle unixctl commands in a
> > separate thread. The commands like sync-status, get-** etc can be
> > easily handled in the thread. Still, there are many commands like
> > ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which
> > changes the state) which needs to be synchronized between the main
> > ovsdb-server thread and the newly added thread using a mutex.
> >
> > Does this approach makes sense ? I started working on it. But I wanted
> > to check with the community before putting into more efforts.
> >
> > Are there better ways to solve this issue ?
> >
> > Thanks
> > Numan
> >
> Hi Numan,
>
> It seems reasonable to me. Multi-threading would add a little complexity,
> but in this case it should be straightforward. It merely requires mutexes
> to synchronize between the threads for *writes*, and also for *reads* of
> non-atomic data.
> The only side effect is that *if* the thread that does the DB job really
> stucked because of a bug and not handling jobs at all, the unixctl thread
> ovsdb-server/sync-status command wouldn't detect it, so it could result in
> pacemaker reporting *happy* status without detecting problems. First for
> all this is unlikely to happen. But if we really think it is a problem we
> can still solve it by incrementing a counter in main loop and have a new
> command (readonly, without mutex) to check if this counter is increasing,
> to tell if the server if really working.
>

I'd be more inclined to do what Han suggests here and that every thread
contributes to the health status with a readonly counter.

Whatever gets implemented here perhaps can be re-used in ovn-controller to
monitor the main & pinctrl threads.
Similar scenario but maybe worse consequences as it affects dataplane is
that the "health" thread reports good status but the pinctrl thread is
stuck and therefore DHCP service is down and instances can't fetch IP.


> Thanks,
> Han
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: Delay in handling unixctl commands in ovsdb-server

2020-02-12 Thread Han Zhou
On Wed, Feb 12, 2020 at 9:57 AM Numan Siddique  wrote:
>
> Hi Ben/All,
>
> In an OVN deployment - with OVN dbs deployed as active/standby using
> pacemaker, we are seeing delays in response to unixctl command -
> ovsdb-server/sync-status.
>
> Pacemaker periodically calls the OVN pacemaker OCF script to get the
> status and this script internally invokes - ovs-appctl -t
> /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large
> deployment with lots of OVN resources we see that ovsdb-server takes a
> lot of time (sometimes > 60 seconds) to respond to this command. This
> causes pacemaker to stop the service in that node and move the master
> to another node. This causes a lot of disruption.
>
> One approach of solving this issue is to handle unixctl commands in a
> separate thread. The commands like sync-status, get-** etc can be
> easily handled in the thread. Still, there are many commands like
> ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which
> changes the state) which needs to be synchronized between the main
> ovsdb-server thread and the newly added thread using a mutex.
>
> Does this approach makes sense ? I started working on it. But I wanted
> to check with the community before putting into more efforts.
>
> Are there better ways to solve this issue ?
>
> Thanks
> Numan
>
Hi Numan,

It seems reasonable to me. Multi-threading would add a little complexity,
but in this case it should be straightforward. It merely requires mutexes
to synchronize between the threads for *writes*, and also for *reads* of
non-atomic data.
The only side effect is that *if* the thread that does the DB job really
stucked because of a bug and not handling jobs at all, the unixctl thread
ovsdb-server/sync-status command wouldn't detect it, so it could result in
pacemaker reporting *happy* status without detecting problems. First for
all this is unlikely to happen. But if we really think it is a problem we
can still solve it by incrementing a counter in main loop and have a new
command (readonly, without mutex) to check if this counter is increasing,
to tell if the server if really working.

Thanks,
Han
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN: Delay in handling unixctl commands in ovsdb-server

2020-02-12 Thread Numan Siddique
Hi Ben/All,

In an OVN deployment - with OVN dbs deployed as active/standby using
pacemaker, we are seeing delays in response to unixctl command -
ovsdb-server/sync-status.

Pacemaker periodically calls the OVN pacemaker OCF script to get the
status and this script internally invokes - ovs-appctl -t
/var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large
deployment with lots of OVN resources we see that ovsdb-server takes a
lot of time (sometimes > 60 seconds) to respond to this command. This
causes pacemaker to stop the service in that node and move the master
to another node. This causes a lot of disruption.

One approach of solving this issue is to handle unixctl commands in a
separate thread. The commands like sync-status, get-** etc can be
easily handled in the thread. Still, there are many commands like
ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which
changes the state) which needs to be synchronized between the main
ovsdb-server thread and the newly added thread using a mutex.

Does this approach makes sense ? I started working on it. But I wanted
to check with the community before putting into more efforts.

Are there better ways to solve this issue ?

Thanks
Numan

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss