Thanks a lot Michele. Just mentioning that this has been tested in an OpenStack environment successfully. A timeout is not needed for the while loop since pacemaker will enforce its own.
On Tue, Jul 9, 2019 at 9:20 AM Michele Baldessari <[email protected]> wrote: > > Currently inside the ovsdb_server_promote() function we call 'promote_ovnnb' > and 'promote_ovnsb' and then just record the new master state in the > CIB. > > This creates a race because those two promote commands are asynchronous > so when we exit the ovsdb_server_promote() function the underlying DBs > are not guaranteed to be in master state. That means that clients might > connect to an instance that is in read-only mode. > > We add a simple sleep loop where we wait for the underlying DB state to > confirm the master state. We do not need to add a timeout loop because > in case of an issue the resource timeout set within pacemaker will kick > in and the resource agent script will be killed by pacemaker. > > Tested this within an openstack environment using ovn with roughly ~20 > reboots and was unable to trigger the issue (before the patch we would > trigger the issue after a couple of reboots tops). > > Signed-off-by: Michele Baldessari <[email protected]> > --- > ovn/utilities/ovndb-servers.ocf | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf > index 10313304cb7c..cd47426689ef 100755 > --- a/ovn/utilities/ovndb-servers.ocf > +++ b/ovn/utilities/ovndb-servers.ocf > @@ -516,6 +516,8 @@ ovsdb_server_stop() { > } > > ovsdb_server_promote() { > + local state > + > ovsdb_server_check_status ignore_northd > rc=$? > case $rc in > @@ -540,7 +542,15 @@ ovsdb_server_promote() { > ${OVN_CTL} --ovn-manage-ovsdb=no start_northd > fi > > - ocf_log debug "ovndb_servers: Promoting $host_name as the master" > + ocf_log debug "ovndb_servers: Waiting for promotion $host_name as master > to complete" > + ovsdb_server_check_status > + state=$? > + while [ "$state" != "$OCF_RUNNING_MASTER" ]; do > + sleep 1 > + ovsdb_server_check_status > + state=$? > + done > + ocf_log debug "ovndb_servers: Promotion of $host_name as the master > completed" > # Record ourselves so that the agent has a better chance of doing > # the right thing at startup > ${CRM_ATTR_REPL_INFO} -v "$host_name" > -- > 2.21.0 Acked-By: Daniel Alvarez <[email protected]> > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
