In some of our destructive testing of ovn-dbs inside containers managed by pacemaker we reached a situation where /var/run/openvswitch had empty .pid files. The current code does not deal well with them and pidfile_is_running() returns true in such a case and this confuses the OCF resource agent.
- Before this change: Inside a container run: killall ovsdb-server; echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid We will observe that the cluster is unable to ever recover because it believes the ovn processes to be running when they really aren't and eventually just fails: podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 - After this change the cluster is able to recover from this state and correctly start the resource: podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 Signed-off-by: Michele Baldessari <mich...@acksyn.org> --- ovn/utilities/ovn-ctl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl index 7e5cd469c83c..65f03e28ddba 100755 --- a/ovn/utilities/ovn-ctl +++ b/ovn/utilities/ovn-ctl @@ -35,7 +35,7 @@ ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" pidfile_is_running () { pidfile=$1 - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" } >/dev/null 2>&1 stop_nb_ovsdb() { -- 2.21.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev