On Tue, Jul 9, 2019 at 1:04 PM Daniel Alvarez Sanchez <[email protected]> wrote:
> Thanks a lot Michele. > Just mentioning that this has been tested in an OpenStack environment > successfully. A timeout is not needed for the while loop since > pacemaker will enforce its own. > > On Tue, Jul 9, 2019 at 9:20 AM Michele Baldessari <[email protected]> > wrote: > > > > Currently inside the ovsdb_server_promote() function we call > 'promote_ovnnb' > > and 'promote_ovnsb' and then just record the new master state in the > > CIB. > > > > This creates a race because those two promote commands are asynchronous > > so when we exit the ovsdb_server_promote() function the underlying DBs > > are not guaranteed to be in master state. That means that clients might > > connect to an instance that is in read-only mode. > > > > We add a simple sleep loop where we wait for the underlying DB state to > > confirm the master state. We do not need to add a timeout loop because > > in case of an issue the resource timeout set within pacemaker will kick > > in and the resource agent script will be killed by pacemaker. > > > > Tested this within an openstack environment using ovn with roughly ~20 > > reboots and was unable to trigger the issue (before the patch we would > > trigger the issue after a couple of reboots tops). > > > > Signed-off-by: Michele Baldessari <[email protected]> > LGTM Acked-by: Numan Siddique <[email protected]> > > --- > > ovn/utilities/ovndb-servers.ocf | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/ovn/utilities/ovndb-servers.ocf > b/ovn/utilities/ovndb-servers.ocf > > index 10313304cb7c..cd47426689ef 100755 > > --- a/ovn/utilities/ovndb-servers.ocf > > +++ b/ovn/utilities/ovndb-servers.ocf > > @@ -516,6 +516,8 @@ ovsdb_server_stop() { > > } > > > > ovsdb_server_promote() { > > + local state > > + > > ovsdb_server_check_status ignore_northd > > rc=$? > > case $rc in > > @@ -540,7 +542,15 @@ ovsdb_server_promote() { > > ${OVN_CTL} --ovn-manage-ovsdb=no start_northd > > fi > > > > - ocf_log debug "ovndb_servers: Promoting $host_name as the master" > > + ocf_log debug "ovndb_servers: Waiting for promotion $host_name as > master to complete" > > + ovsdb_server_check_status > > + state=$? > > + while [ "$state" != "$OCF_RUNNING_MASTER" ]; do > > + sleep 1 > > + ovsdb_server_check_status > > + state=$? > > + done > > + ocf_log debug "ovndb_servers: Promotion of $host_name as the master > completed" > > # Record ourselves so that the agent has a better chance of doing > > # the right thing at startup > > ${CRM_ATTR_REPL_INFO} -v "$host_name" > > -- > > 2.21.0 > > Acked-By: Daniel Alvarez <[email protected]> > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
