On Wed, Feb 12, 2020 at 9:57 AM Numan Siddique <[email protected]> wrote: > > Hi Ben/All, > > In an OVN deployment - with OVN dbs deployed as active/standby using > pacemaker, we are seeing delays in response to unixctl command - > ovsdb-server/sync-status. > > Pacemaker periodically calls the OVN pacemaker OCF script to get the > status and this script internally invokes - ovs-appctl -t > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status. In a large > deployment with lots of OVN resources we see that ovsdb-server takes a > lot of time (sometimes > 60 seconds) to respond to this command. This > causes pacemaker to stop the service in that node and move the master > to another node. This causes a lot of disruption. > > One approach of solving this issue is to handle unixctl commands in a > separate thread. The commands like sync-status, get-** etc can be > easily handled in the thread. Still, there are many commands like > ovsdb-server/set-active-ovsdb-server, ovsdb-server/compact etc (which > changes the state) which needs to be synchronized between the main > ovsdb-server thread and the newly added thread using a mutex. > > Does this approach makes sense ? I started working on it. But I wanted > to check with the community before putting into more efforts. > > Are there better ways to solve this issue ? > > Thanks > Numan > Hi Numan,
It seems reasonable to me. Multi-threading would add a little complexity, but in this case it should be straightforward. It merely requires mutexes to synchronize between the threads for *writes*, and also for *reads* of non-atomic data. The only side effect is that *if* the thread that does the DB job really stucked because of a bug and not handling jobs at all, the unixctl thread ovsdb-server/sync-status command wouldn't detect it, so it could result in pacemaker reporting *happy* status without detecting problems. First for all this is unlikely to happen. But if we really think it is a problem we can still solve it by incrementing a counter in main loop and have a new command (readonly, without mutex) to check if this counter is increasing, to tell if the server if really working. Thanks, Han
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
