On Tue, Nov 10, 2020 at 09:47:56AM +0100, Dumitru Ceara wrote: > On 11/10/20 1:26 AM, Ben Pfaff wrote: > > On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote: > >> On 11/6/20 4:36 AM, Ben Pfaff wrote: > >>> Many of these could be replaced by "ovn-nbctl sync". Some weren't > >>> really needed at all because they were adjacent to something that itself > >>> called sync or otherwise used --wait. Some were more appropriately > >>> done with explicit waits for what was really needed. > >>> > >>> I left some "sleep"s. Some were because they were "negative" sleeps: > >>> they were giving time for something to happen that should *not* happen > >>> (in other words, if you wait for it to happen, you'll wait forever, > >>> unless there's a bug). Some were because I didn't know how to properly > >>> wait for what they were waiting for, or because I didn't understand > >>> the circumstances deeply enough. > >>> > >>> Signed-off-by: Ben Pfaff <[email protected]> > >>> --- > >>> tests/ovn-northd.at | 7 ++ > >>> tests/ovn.at | 167 ++++++++++++-------------------------------- > >>> 2 files changed, 52 insertions(+), 122 deletions(-) > >>> > >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > >>> index 949a8eee054e..5d670233561e 100644 > >>> --- a/tests/ovn-northd.at > >>> +++ b/tests/ovn-northd.at > >>> @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \ > >>> Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ > >>> health_check @hc > >>> wait_row_count Service_Monitor 2 > >>> +check ovn-nbctl --wait=hv sync > >> > >> This should be "--wait=sb". ovn-controller is not started for tests in > >> ovn-northd.at and "--wait=hv" would return immediately. This applies to > >> all "ovn-nbctl --wait=hv sync" in ovn-northd.at. > > > > I see what you mean. > > > > This is surprising to me. When I defined this stuff years ago, my > > intent was that --wait=hv would be stronger than --wait=sb. That is, > > --wait=sb would cause ovn-nbctl to wait for the southbound database to > > catch up with the northbound changes, and --wait=hb would cause it to > > wait for the southbound database AND the hypervisors to catch up. > > > > That's even how it's documented for ovn-nbctl: > > > > <p> > > With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, > > it > > waits for <code>ovn-northd</code> to bring the southbound database > > up-to-date with the northbound database updates. > > </p> > > > > <p> > > With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, > > it > > additionally waits for all OVN chassis (hypervisors and gateways) > > to > > become up-to-date with the northbound database updates. (This can > > become an indefinite wait if any chassis is malfunctioning.) > > </p> > > > > That's not what actually happens, though. As you point out, if there > > are no hypervisors, then the hypervisors are always up-to-date, even if > > the database is not. > > > > I think this is a bug in ovn-nbctl. Arguably, it's a bug in the > > database definition (perhaps hv_cfg shouldn't even be filled in but left > > empty if there are no chassis) but I think it is too late to fix it > > there. It is, however, easy enough to fix it in ovn-nbctl. > > > > (And at the same time, I'll change these in ovn-northd to just say > > --wait=sb. You have a point.) > > > > How about this as an additional patch? > > > > -8<--------------------------cut here-------------------------->8-- > > > > From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001 > > From: Ben Pfaff <[email protected]> > > Date: Mon, 9 Nov 2020 16:12:50 -0800 > > Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database > > in > > corner case. > > > > In the corner case where there are no chassis, --wait=hv didn't wait at > > all, instead of waiting for the southbound database. > > > > Reported-by: Dumitru Ceara <[email protected]> > > Signed-off-by: Ben Pfaff <[email protected]> > > --- > > ovn-nb.xml | 27 ++++++++++++++++++++------- > > utilities/ovn-nbctl.c | 2 +- > > 2 files changed, 21 insertions(+), 8 deletions(-) > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index 51b186b84419..d3e51f3e5e87 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -84,13 +84,26 @@ > > </column> > > > > <column name="hv_cfg"> > > - Sequence number that <code>ovn-northd</code> sets to the smallest > > - sequence number of all the chassis in the system, as reported in > > the > > - <code>Chassis_Private</code> table in the southbound database. > > Thus, > > - <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all > > chassis are > > - caught up with the northbound configuration (which may never > > happen, if > > - any chassis is down). This value can regress, if a chassis was > > removed > > - from the system and rejoins before catching up. > > + <p> > > + Sequence number that <code>ovn-northd</code> sets to the smallest > > + sequence number of all the chassis in the system, as reported in > > the > > + <code>Chassis_Private</code> table in the southbound database. > > Thus, > > + <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all > > chassis are > > + caught up with the northbound configuration (which may never > > happen, if > > + any chassis is down). This value can regress, if a chassis was > > removed > > + from the system and rejoins before catching up. > > + </p> > > + > > + <p> > > + If there are no chassis, then <code>ovn-northd</code> copies > > + <code>nb_cfg</code> to <ref column="hv_cfg"/>. Thus, in this > > case, > > + the (nonexistent) hypervisors are always considered to be caught > > up. > > + This means that hypervisors can be "caught up" even in cases > > where > > + <ref column="sb_cfg"/> would show that the southbound database is > > + not. To detect when both the hypervisors and the southbound > > database > > + are caught up, a client should take the smaller of <ref > > + column="sb_cfg"/> and <ref column="hv_cfg"/>. > > + </p> > > </column> > > > > <column name="hv_cfg_timestamp"> > > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c > > index 4f28edc808ec..ee9849de53a9 100644 > > --- a/utilities/ovn-nbctl.c > > +++ b/utilities/ovn-nbctl.c > > @@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command > > *commands, size_t n_commands, > > NBREC_NB_GLOBAL_FOR_EACH (nb, idl) { > > int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB > > ? nb->sb_cfg > > - : nb->hv_cfg); > > + : MIN(nb->sb_cfg, nb->hv_cfg)); > > if (cur_cfg >= next_cfg) { > > if (print_wait_time) { > > printf("Time spent on processing nb_cfg > > %"PRId64":\n", > > > > Looks good to me, thanks!
Thanks, I applied this bug fix to OVN master. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
