On Wed, Nov 5, 2025 at 11:05 PM Rosemarie O'Riorden <[email protected]>
wrote:

>
>
> On 11/5/25 2:50 PM, Ilya Maximets wrote:
> > On 11/5/25 7:15 PM, Rosemarie O'Riorden wrote:
> >> Hi Ales, thanks so much for the review!
> >>
> >> I agree with all of the changes you suggest. I thought checking the
> >> transaction count was necessary because there are two cases where it
> >> could be wrong, but it seems the test consistently fails at checking the
> >> number of tunnels in both of those cases, so it should be fine without.
> >>
> >> Thanks again!
> >>
> >>
> >> On 11/5/25 6:31 AM, Ales Musil wrote:
> >>>
> >>>
> >>> On Wed, Nov 5, 2025 at 12:57 AM Rosemarie O'Riorden via dev <ovs-
> >>> [email protected] <mailto:[email protected]>> wrote:
> >>>
> >>>     If you set a chassis' encap ip to be the same as that of another
> >>>     chassis, OVN should detect it, log an error, and delete the chassis
> >>>     record.
> >>>
> >>>     But this was not the case; OVN would allow the transaction.
> However, the
> >>>     commit to the southbound database would fail, because there was
> already
> >>>     an encap record with that same ip and type (from the other
> chassis).
> >>>     The OVS db on the problematic chassis would then rapidly increase
> in
> >>>     size. Essentially, the controller would get stuck in a loop of
> deleting
> >>>     and recreating the tunnels.
> >>>
> >>>     This change makes it so when encaps are reconfigured, before being
> >>>     built, in chassis_build_encaps(), the controller detects if there
> is
> >>>     already an existing encap tunnel of the same ip and type. If so,
> and
> >>>     there are no good (unique) encaps configured to use, only bad ones
> (as
> >>>     in already used on another chassis), the controller logs an error,
> >>>     refuses to build the encaps, and deletes the associated chassis and
> >>>     chassis private record. If there are both good encaps IPs and bad
> ones
> >>>     configured, the good ones will be used to build encaps and the bad
> ones
> >>>     will be ignored.
> >>>
> >>>     In chassis_build_encaps(), only if an Encap record does not already
> >>>     exist will a new record be built. And options will only be updated
> if
> >>>     they've changed.
> >>>
> >>>     Another change this patch introduces is if there's currently a
> >>>     southbound transaction in flight, encaps_run() will not run. This
> is to
> >>>     maintain symmetry with chassis_run()'s behavior. This change
> prevents an
> >>>     inconsistency where the controller saw ports in the local OVS db
> but not
> >>>     in sb-db, and then deleted and recreated them.
> >>>
> >>>     Reported-at: https://issues.redhat.com/browse/FDP-1481 <https://
> >>>     issues.redhat.com/browse/FDP-1481>
> >>>
> >>>
> >>> nit: This should be https://issues.redhat.com/browse/FDP-2013
> <https://
> >>> issues.redhat.com/browse/FDP-2013>
> >>>
> >>>
> >>>
> >>>     Co-authored-by: Nicholas Hubbard <[email protected]
> >>>     <mailto:[email protected]>>
> >>>     Signed-off-by: Nicholas Hubbard <[email protected]
> >>>     <mailto:[email protected]>>
> >>>     Signed-off-by: Rosemarie O'Riorden <[email protected]
> >>>     <mailto:[email protected]>>
> >>>     ---
> >>>
> >>>
> >>> Hi Rosemarie,
> >>>
> >>> thank you for the patch, I have a couple of comments down below.
> >>>
> >>>
> >>>     v4:
> >>>     - Fix bug Jacob discovered where ovn-controller got stuck in a
> loop.
> >>>       -> Change chassis_build_encaps() to not always recreate all
> >>>     encaps, but
> >>>          instead, only modify records if they've changed, and create
> new
> >>>     records if
> >>>          there are not existing ones.
> >>>       -> Document above change with comments.
> >>>       -> Also add a comment to explain how good IPs will be used even
> if bad
> >>>          ones are configured.
> >>>       -> Make encap_lookup_by_ip_and_type() not return const, so encap
> >>>     recs can be
> >>>          modified
> >>>       -> Add ovnsb_idl_txn != NULL as a requirement for encaps_run()
> to run
> >>>          to have symmetry with chassis_build_encaps() so controller
> won't
> >>>          unnecessarily delete and recreate tunnels.
> >>>     - Change test in many ways
> >>>       -> Update name.
> >>>       -> Update method of verifying success, check # of transactions
> >>>     with jsonrpc
> >>>          debug logging.
> >>>       -> Add the case where there are good and bad IPs.
> >>>       -> Check that the warning I added appears.
> >>>     - Update commit message to reflect all changes above.
> >>>
> >>>     v3:
> >>>     - Update misleading comment
> >>>     - Add deletion of chassis private record
> >>>     - Move declaration function name to same line as return type
> >>>     - Delete "Transaction causes multiple rows" error from exceptions
> of
> >>>     affected
> >>>       test
> >>>     - Update my own error to be more specific in exceptions of
> affected test
> >>>     - Update test to not be racy; Check for db row counts instead of
> >>>     saving file
> >>>       size after change
> >>>     - Update commit message to reflect changes
> >>>
> >>>     v2: Fix typo in title.
> >>>     ---
> >>>      controller/chassis.c        | 133
> +++++++++++++++++++++++++-----------
> >>>      controller/chassis.h        |   3 +-
> >>>      controller/encaps.c         |   3 +-
> >>>      controller/encaps.h         |   1 +
> >>>      controller/ovn-controller.c |   7 +-
> >>>      lib/chassis-index.c         |  20 ++++++
> >>>      lib/chassis-index.h         |   3 +
> >>>      tests/ovn-controller.at <http://ovn-controller.at>     |  65
> ++++++
> >>>     ++++++++++--
> >>>      8 files changed, 186 insertions(+), 49 deletions(-)
> >>>
> >>>     diff --git a/controller/chassis.c b/controller/chassis.c
> >>>     index 07bef1c56..2b6bfc247 100644
> >>>     --- a/controller/chassis.c
> >>>     +++ b/controller/chassis.c
> >>>     @@ -77,6 +77,12 @@ struct ovs_chassis_cfg {
> >>>          bool ct_label_flush;
> >>>      };
> >>>
> >>>     +enum chassis_update_status {
> >>>     +    CHASSIS_UPDATED,
> >>>     +    CHASSIS_NOT_UPDATED,
> >>>     +    CHASSIS_NEED_DELETE
> >>>     +};
> >>>     +
> >>>      static void
> >>>      ovs_chassis_cfg_init(struct ovs_chassis_cfg *cfg)
> >>>      {
> >>>     @@ -699,7 +705,8 @@ chassis_build_encaps(struct ovsdb_idl_txn
> >>>     *ovnsb_idl_txn,
> >>>                           const char *encap_ip_default,
> >>>                           const char *chassis_id,
> >>>                           const char *encap_csum,
> >>>     -                     size_t *n_encap)
> >>>     +                     size_t *n_encap,
> >>>     +                     struct ovsdb_idl_index *sbrec_encaps)
> >>>      {
> >>>          size_t tunnel_count = 0;
> >>>
> >>>     @@ -713,20 +720,45 @@ chassis_build_encaps(struct ovsdb_idl_txn
> >>>     *ovnsb_idl_txn,
> >>>
> >>>          SSET_FOR_EACH (encap_ip, encap_ip_set) {
> >>>              SSET_FOR_EACH (encap_type, encap_type_set) {
> >>>     -            struct sbrec_encap *encap =
> >>>     sbrec_encap_insert(ovnsb_idl_txn);
> >>>     -
> >>>     -            sbrec_encap_set_type(encap, encap_type);
> >>>     -            sbrec_encap_set_ip(encap, encap_ip);
> >>>     -            if (encap_ip_default && !strcmp(encap_ip_default,
> >>>     encap_ip)) {
> >>>     -                struct smap _options;
> >>>     -                smap_clone(&_options, &options);
> >>>     -                smap_add(&_options, "is_default", "true");
> >>>     -                sbrec_encap_set_options(encap, &_options);
> >>>     -                smap_destroy(&_options);
> >>>     +            struct sbrec_encap *encap_rec =
> >>>     +                encap_lookup_by_ip_and_type(sbrec_encaps,
> >>>     +                                            encap_ip, encap_type);
> >>>     +            if (encap_rec && strcmp(encap_rec->chassis_name,
> >>>     chassis_id)) {
> >>>     +                static struct vlog_rate_limit rl =
> >>>     VLOG_RATE_LIMIT_INIT(5, 1);
> >>>     +                VLOG_WARN_RL(&rl, "'%s' already has encap ip '%s'
> and "
> >>>     +                             "type '%s', cannot duplicate on
> '%s'",
> >>>     +                             encap_rec->chassis_name,
> encap_rec->ip,
> >>>     +                             encap_rec->type, chassis_id);
> >>>     +
> >>>     +                /* Build encaps for valid IPs even if bad IPs are
> >>>     +                 * configured. */
> >>>     +                continue;
> >>>     +            }
> >>>     +
> >>>     +            /* Only if an Encap record does not already exist do
> we
> >>>     create a
> >>>     +             * new record. */
> >>>     +            struct sbrec_encap *encap;
> >>>
> >>>
> >>> No need to add a new variable, the encap_rec can be reused.
> >>>
> >>>
> >>>     +            if (!encap_rec) {
> >>>     +                encap = sbrec_encap_insert(ovnsb_idl_txn);
> >>>     +                sbrec_encap_set_type(encap, encap_type);
> >>>     +                sbrec_encap_set_ip(encap, encap_ip);
> >>>     +                sbrec_encap_set_chassis_name(encap, chassis_id);
> >>>                  } else {
> >>>     -                sbrec_encap_set_options(encap, &options);
> >>>     +                encap = encap_rec;
> >>>     +            }
> >>>     +
> >>>     +            if (!smap_is_empty(&encap->options) ||
> >>>     +                !smap_equal(&options, &encap->options)) {
> >>>     +                if (encap_ip_default && !strcmp(encap_ip_default,
> >>>     encap_ip)) {
> >>>     +                    struct smap _options;
> >>>     +                    smap_clone(&_options, &options);
> >>>     +                    smap_add(&_options, "is_default", "true");
> >>>     +                    sbrec_encap_set_options(encap, &_options);
> >>>     +                    smap_destroy(&_options);
> >>>     +                } else {
> >>>     +                    sbrec_encap_set_options(encap, &options);
> >>>     +                }
> >>>                  }
> >>>     -            sbrec_encap_set_chassis_name(encap, chassis_id);
> >>>
> >>>                  encaps[tunnel_count] = encap;
> >>>                  tunnel_count++;
> >>>     @@ -778,7 +810,7 @@ update_supported_sset(struct sset *supported)
> >>>
> >>>      static void
> >>>      remove_unsupported_options(const struct sbrec_chassis
> *chassis_rec,
> >>>     -                           bool *updated)
> >>>     +                           enum chassis_update_status
> *update_status)
> >>>      {
> >>>          struct sset supported_options =
> >>>     SSET_INITIALIZER(&supported_options);
> >>>          update_supported_sset(&supported_options);
> >>>     @@ -789,7 +821,7 @@ remove_unsupported_options(const struct
> >>>     sbrec_chassis *chassis_rec,
> >>>                  VLOG_WARN("Removing unsupported key \"%s\" from
> chassis
> >>>     record.",
> >>>                            node->key);
> >>>                  sbrec_chassis_update_other_config_delkey(chassis_rec,
> >>>     node->key);
> >>>     -            *updated = true;
> >>>     +            *update_status = CHASSIS_UPDATED;
> >>>              }
> >>>          }
> >>>
> >>>     @@ -827,25 +859,28 @@ chassis_get_record(struct ovsdb_idl_txn
> >>>     *ovnsb_idl_txn,
> >>>      }
> >>>
> >>>      /* Update a Chassis record based on the config in the ovs config.
> >>>     - * Returns true if 'chassis_rec' was updated, false otherwise.
> >>>     + * Returns CHASSIS_UPDATED if 'chassis_rec' was updated,
> >>>     CHASSIS_NEED_DELETE if
> >>>     + * another chassis exists with an encap with same IP and type as
> >>>     'chassis', and
> >>>     + * returns CHASSIS_NOT_UPDATED otherwise.
> >>>       */
> >>>     -static bool
> >>>     +static enum chassis_update_status
> >>>      chassis_update(const struct sbrec_chassis *chassis_rec,
> >>>                     struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>                     const struct ovs_chassis_cfg *ovs_cfg,
> >>>                     const char *chassis_id,
> >>>     -               const struct sset *transport_zones)
> >>>     +               const struct sset *transport_zones,
> >>>     +               struct ovsdb_idl_index *sbrec_encaps)
> >>>      {
> >>>     -    bool updated = false;
> >>>     +    enum chassis_update_status update_status =
> CHASSIS_NOT_UPDATED;
> >>>
> >>>          if (strcmp(chassis_id, chassis_rec->name)) {
> >>>              sbrec_chassis_set_name(chassis_rec, chassis_id);
> >>>     -        updated = true;
> >>>     +        update_status = CHASSIS_UPDATED;
> >>>          }
> >>>
> >>>          if (strcmp(ovs_cfg->hostname, chassis_rec->hostname)) {
> >>>              sbrec_chassis_set_hostname(chassis_rec,
> ovs_cfg->hostname);
> >>>     -        updated = true;
> >>>     +        update_status = CHASSIS_UPDATED;
> >>>          }
> >>>
> >>>          if (chassis_other_config_changed(ovs_cfg, chassis_rec)) {
> >>>     @@ -856,12 +891,12 @@ chassis_update(const struct sbrec_chassis
> >>>     *chassis_rec,
> >>>              sbrec_chassis_verify_other_config(chassis_rec);
> >>>              sbrec_chassis_set_other_config(chassis_rec,
> &other_config);
> >>>              smap_destroy(&other_config);
> >>>     -        updated = true;
> >>>     +        update_status = CHASSIS_UPDATED;
> >>>          }
> >>>
> >>>          update_chassis_transport_zones(transport_zones, chassis_rec);
> >>>
> >>>     -    remove_unsupported_options(chassis_rec, &updated);
> >>>     +    remove_unsupported_options(chassis_rec, &update_status);
> >>>
> >>>          /* If any of the encaps should change, update them. */
> >>>          bool tunnels_changed =
> >>>     @@ -871,7 +906,7 @@ chassis_update(const struct sbrec_chassis
> >>>     *chassis_rec,
> >>>                                      ovs_cfg->encap_csum,
> >>>                                      chassis_rec);
> >>>          if (!tunnels_changed) {
> >>>     -        return updated;
> >>>     +        return update_status;
> >>>          }
> >>>
> >>>          struct sbrec_encap **encaps;
> >>>     @@ -881,10 +916,16 @@ chassis_update(const struct sbrec_chassis
> >>>     *chassis_rec,
> >>>              chassis_build_encaps(ovnsb_idl_txn,
> &ovs_cfg->encap_type_set,
> >>>                                   &ovs_cfg->encap_ip_set,
> >>>                                   ovs_cfg->encap_ip_default,
> chassis_id,
> >>>     -                             ovs_cfg->encap_csum, &n_encap);
> >>>     +                             ovs_cfg->encap_csum, &n_encap,
> >>>     +                             sbrec_encaps);
> >>>          sbrec_chassis_set_encaps(chassis_rec, encaps, n_encap);
> >>>          free(encaps);
> >>>     -    return true;
> >>>     +
> >>>     +    if (!n_encap) {
> >>>     +        return CHASSIS_NEED_DELETE;
> >>>     +    }
> >>>     +
> >>>     +    return CHASSIS_UPDATED;
> >>>      }
> >>>
> >>>      /* If this is a chassis_private config update after we initialized
> >>>     the record
> >>>     @@ -935,7 +976,8 @@ chassis_run(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >>>                  const char *chassis_id,
> >>>                  const struct ovsrec_bridge *br_int,
> >>>                  const struct sset *transport_zones,
> >>>     -            const struct sbrec_chassis_private **chassis_private)
> >>>     +            const struct sbrec_chassis_private **chassis_private,
> >>>     +            struct ovsdb_idl_index *sbrec_encaps)
> >>>      {
> >>>          struct ovs_chassis_cfg ovs_cfg;
> >>>
> >>>     @@ -956,22 +998,31 @@ chassis_run(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >>>           * modified in the ovs table.
> >>>           */
> >>>          if (chassis_rec && ovnsb_idl_txn) {
> >>>     -        bool updated = chassis_update(chassis_rec, ovnsb_idl_txn,
> >>>     &ovs_cfg,
> >>>     -                                      chassis_id,
> transport_zones);
> >>>     -
> >>>     -        if (!existed || updated) {
> >>>     -            ovsdb_idl_txn_add_comment(ovnsb_idl_txn,
> >>>     -                                      "ovn-controller: %s chassis
> >>>     '%s'",
> >>>     -                                      !existed ? "registering" :
> >>>     "updating",
> >>>     -                                      chassis_id);
> >>>     -        }
> >>>     +        enum chassis_update_status update_status =
> >>>     +            chassis_update(chassis_rec, ovnsb_idl_txn, &ovs_cfg,
> >>>     +                           chassis_id,
> >>>     +                           transport_zones, sbrec_encaps);
> >>>     +
> >>>     +        *chassis_private =
> chassis_private_get_record(ovnsb_idl_txn,
> >>>     +                               sbrec_chassis_private_by_name,
> >>>     chassis_id);
> >>>
> >>>     -        *chassis_private =
> >>>     -            chassis_private_get_record(ovnsb_idl_txn,
> >>>     -
>  sbrec_chassis_private_by_name,
> >>>     +        if (update_status == CHASSIS_NEED_DELETE) {
> >>>     +            sbrec_chassis_delete(chassis_rec);
> >>>     +            chassis_rec = NULL;
> >>>     +            if (*chassis_private) {
> >>>     +                sbrec_chassis_private_delete(*chassis_private);
> >>>     +            }
> >>>
> >>>
> >>> nit: We can return early here instead of nesting.
> >>>
> >>>
> >>>     +        } else {
> >>>     +            if (!existed || update_status == CHASSIS_UPDATED) {
> >>>     +                ovsdb_idl_txn_add_comment(ovnsb_idl_txn,
> >>>     +                                          "ovn-controller: %s
> >>>     chassis '%s'",
> >>>     +                                          !existed ? "registering"
> >>>     +                                          : "updating",
> chassis_id);
> >>>     +            }
> >>>     +            if (*chassis_private) {
> >>>     +                chassis_private_update(*chassis_private,
> chassis_rec,
> >>>                                             chassis_id);
> >>>     -        if (*chassis_private) {
> >>>     -            chassis_private_update(*chassis_private, chassis_rec,
> >>>     chassis_id);
> >>>     +            }
> >>>              }
> >>>          }
> >>>
> >>>     diff --git a/controller/chassis.h b/controller/chassis.h
> >>>     index 03cc2f906..14251bdad 100644
> >>>     --- a/controller/chassis.h
> >>>     +++ b/controller/chassis.h
> >>>     @@ -44,7 +44,8 @@ const struct sbrec_chassis *chassis_run(
> >>>          const struct ovsrec_open_vswitch_table *,
> >>>          const char *chassis_id, const struct ovsrec_bridge *br_int,
> >>>          const struct sset *transport_zones,
> >>>     -    const struct sbrec_chassis_private **chassis_private);
> >>>     +    const struct sbrec_chassis_private **chassis_private,
> >>>     +    struct ovsdb_idl_index *sbrec_chassis_encaps);
> >>>      bool chassis_cleanup(struct ovsdb_idl_txn *ovs_idl_txn,
> >>>                           struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>                           const struct ovsrec_open_vswitch_table *,
> >>>     diff --git a/controller/encaps.c b/controller/encaps.c
> >>>     index 67f8c9c9c..fd34a8ea0 100644
> >>>     --- a/controller/encaps.c
> >>>     +++ b/controller/encaps.c
> >>>     @@ -549,6 +549,7 @@ create_evpn_tunnels(struct tunnel_ctx *tc)
> >>>
> >>>      void
> >>>      encaps_run(struct ovsdb_idl_txn *ovs_idl_txn,
> >>>     +           struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>                 const struct ovsrec_bridge *br_int,
> >>>                 const struct sbrec_chassis_table *chassis_table,
> >>>                 const struct sbrec_chassis *this_chassis,
> >>>     @@ -557,7 +558,7 @@ encaps_run(struct ovsdb_idl_txn *ovs_idl_txn,
> >>>                 const struct sset *transport_zones,
> >>>                 const struct ovsrec_bridge_table *bridge_table)
> >>>      {
> >>>     -    if (!ovs_idl_txn || !br_int) {
> >>>     +    if (!ovs_idl_txn || !ovnsb_idl_txn || !br_int) {
> >>>              return;
> >>>          }
> >>>
> >>>     diff --git a/controller/encaps.h b/controller/encaps.h
> >>>     index 6f9891ee5..3f4a82d68 100644
> >>>     --- a/controller/encaps.h
> >>>     +++ b/controller/encaps.h
> >>>     @@ -30,6 +30,7 @@ struct sset;
> >>>
> >>>      void encaps_register_ovs_idl(struct ovsdb_idl *);
> >>>      void encaps_run(struct ovsdb_idl_txn *ovs_idl_txn,
> >>>     +                struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>                      const struct ovsrec_bridge *br_int,
> >>>                      const struct sbrec_chassis_table *,
> >>>                      const struct sbrec_chassis *,
> >>>     diff --git a/controller/ovn-controller.c
> b/controller/ovn-controller.c
> >>>     index c2dab41c1..6ef32d6d5 100644
> >>>     --- a/controller/ovn-controller.c
> >>>     +++ b/controller/ovn-controller.c
> >>>     @@ -6728,6 +6728,9 @@ main(int argc, char *argv[])
> >>>          struct ovsdb_idl_index
> *sbrec_advertised_mac_binding_index_by_dp
> >>>              = ovsdb_idl_index_create1(ovnsb_idl_loop.idl,
> >>>
> >>>      &sbrec_advertised_mac_binding_col_datapath);
> >>>     +    struct ovsdb_idl_index *sbrec_encaps
> >>>
> >>>
> >>> nit: We should use the naming convention as for other indexes:
> >>> "sbrec_encaps_index_by_ip_and_type".
> >>>
> >>>     +        = ovsdb_idl_index_create2(ovnsb_idl_loop.idl,
> >>>     +                                  &sbrec_encap_col_type,
> >>>     &sbrec_encap_col_ip);
> >>>
> >>>          ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
> >>>          ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> >>>     @@ -7450,7 +7453,7 @@ main(int argc, char *argv[])
> >>>
>  sbrec_chassis_private_by_name,
> >>>                                            ovs_table, chassis_id,
> >>>                                            br_int, &transport_zones,
> >>>     -                                      &chassis_private);
> >>>     +                                      &chassis_private,
> sbrec_encaps);
> >>>                  }
> >>>
> >>>                  /* If any OVS feature support changed, force a full
> >>>     recompute.
> >>>     @@ -7489,7 +7492,7 @@ main(int argc, char *argv[])
> >>>                      }
> >>>
> >>>                      if (chassis && ovs_feature_set_discovered()) {
> >>>     -                    encaps_run(ovs_idl_txn, br_int,
> >>>     +                    encaps_run(ovs_idl_txn, ovnsb_idl_txn, br_int,
> >>>
> >>>     sbrec_chassis_table_get(ovnsb_idl_loop.idl),
> >>>                                     chassis,
> >>>
> >>>     sbrec_sb_global_first(ovnsb_idl_loop.idl),
> >>>     diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> >>>     index 4b38036cb..c1e0234e3 100644
> >>>     --- a/lib/chassis-index.c
> >>>     +++ b/lib/chassis-index.c
> >>>     @@ -115,3 +115,23 @@ ha_chassis_group_lookup_by_name(
> >>>
> >>>          return retval;
> >>>      }
> >>>     +
> >>>     +/* Finds and returns the encap with the given ip and type, or NULL
> >>>     if no such
> >>>     + * encap exists. */
> >>>     +struct sbrec_encap *
> >>>     +encap_lookup_by_ip_and_type(struct ovsdb_idl_index
> >>>     *sbrec_chassis_encaps,
> >>>     +                       const char *ip, const char *type)
> >>>     +{
> >>>     +    struct sbrec_encap *target = sbrec_encap_index_init_row(
> >>>     +        sbrec_chassis_encaps);
> >>>     +
> >>>     +    sbrec_encap_index_set_ip(target, ip);
> >>>     +    sbrec_encap_index_set_type(target, type);
> >>>     +
> >>>     +    struct sbrec_encap *retval = sbrec_encap_index_find(
> >>>     +        sbrec_chassis_encaps, target);
> >>>     +
> >>>     +    sbrec_encap_index_destroy_row(target);
> >>>     +
> >>>     +    return retval;
> >>>     +}
> >>>     diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> >>>     index bc654da13..3c88ae631 100644
> >>>     --- a/lib/chassis-index.h
> >>>     +++ b/lib/chassis-index.h
> >>>     @@ -35,5 +35,8 @@ chassis_private_lookup_by_name(
> >>>      struct ovsdb_idl_index *ha_chassis_group_index_create(struct
> >>>     ovsdb_idl *idl);
> >>>      const struct sbrec_ha_chassis_group
> *ha_chassis_group_lookup_by_name(
> >>>          struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const
> >>>     char *name);
> >>>     +struct sbrec_encap *encap_lookup_by_ip_and_type(
> >>>     +    struct ovsdb_idl_index *sbrec_chassis_encaps,
> >>>     +    const char *ip, const char *type);
> >>>
> >>>      #endif /* lib/chassis-index.h */
> >>>     diff --git a/tests/ovn-controller.at <http://ovn-controller.at> b/
> >>>     tests/ovn-controller.at <http://ovn-controller.at>
> >>>     index bbf939d25..9842bd976 100644
> >>>     --- a/tests/ovn-controller.at <http://ovn-controller.at>
> >>>     +++ b/tests/ovn-controller.at <http://ovn-controller.at>
> >>>     @@ -277,7 +277,8 @@ ovs-vsctl -- set Open_vSwitch . external-
> >>>     ids:hostname="${new_sysid}" \
> >>>                -- set Open_vSwitch . external-ids:ovn-
> >>>     remote="${current_remote}"
> >>>
> >>>      OVS_WAIT_UNTIL([
> >>>     -    grep -q 'Transaction causes multiple rows in \\"Encap\\" table
> >>>     to have identical values' hv/ovn-controller.log
> >>>     +    grep -q "'hv' already has encap ip '192.168.0.1' and type
> >>>     'geneve', " \
> >>>     +            "cannot duplicate on 'hv-foo'" hv/ovn-controller.log
> >>>      ])
>
> Since Ilya pointed out "geneve" could also be "vxlan", should we replace
> this with:
>
> +grep -qE "'hv' already has encap ip '192.168.0.1' and type" \
> +        "'(geneve|vxlan)', cannot duplicate on 'hv-foo'" \
> +        hv/ovn-controller.log
>
> This doesn't work in the other case though, so I suppose the type could
> be removed entirely from the error message. I figured it was best to
> include as much information as possible, but if it's up to the order in
> the hashmap that doesn't seem very important...
>
> >>>
> >>>      # Destroy the stale entries manually and ovn-controller should now
> >>>     be able
> >>>     @@ -305,12 +306,12 @@ check ovn-sbctl destroy chassis_private . --
> >>>     destroy chassis .
> >>>      # - could not create datapath br-int of unknown type foobar
> >>>      # - unknown datapath type bar
> >>>      # - could not create datapath br-int of unknown type bar
> >>>     -# - Transaction causes multiple rows ...
> >>>      # - failed to create bridge br-int: Address family not supported
> by
> >>>     protocol # due to previous errors.
> >>>     +# - 'hv' already has encap ip '192.168.0.1' and type 'geneve',
> >>>     cannot duplicate on 'hv-foo'
> >>>      OVN_CLEANUP_SBOX([hv], ["/foo/d
> >>>      /bar/d
> >>>     -/Transaction causes multiple rows/d
> >>>     -/failed to create bridge/d"])
> >>>     +/failed to create bridge/d
> >>>     +/already has encap ip.*cannot duplicate on/d"])
> >>>
> >>>
> >>> nit: We shouldn't change the order.
> >>>
> >>>
> >>>
> >>>      OVN_CLEANUP_VSWITCH([main])
> >>>      as ovn-sb
> >>>     @@ -3788,3 +3789,59 @@ wait_ports 0
> >>>      OVN_CLEANUP([hv1
> >>>      /Invalid VXLAN port number.*/d])
> >>>      AT_CLEANUP
> >>>     +
> >>>     +OVN_FOR_EACH_NORTHD([
> >>>     +AT_SETUP([ovn-controller - two chassis set same encap ip and
> type])
> >>>     +AT_KEYWORDS([ovn])
> >>>     +ovn_start
> >>>     +net_add n1
> >>>     +
> >>>     +sim_add hv1
> >>>     +as hv1
> >>>     +check ovs-vsctl add-br br-phys
> >>>     +ovn_attach n1 br-phys 192.168.0.11
> >>>     +
> >>>     +sim_add hv2
> >>>     +as hv2
> >>>     +check ovs-vsctl add-br br-phys
> >>>     +ovn_attach n1 br-phys 192.168.0.12
> >>>     +
> >>>     +# Wait for tunnels to appear in OVS, encaps to appear in sb-db,
> and set
> >>>     +# controller's jsonrpc debug logging.
> >>>     +as hv2
> >>>     +OVS_WAIT_UNTIL([test 1 = $(ovs-vsctl find Interface type=geneve |
> >>>     grep -c _uuid)])
> >>>
> >>>
> >>> nit: We can use OVS_WAIT_FOR_OUTPUT instead.
> >>>
> >>>
> >>>     +wait_row_count Encap 4
> >>>     +as hv1
> >>>     +OVS_WAIT_UNTIL([test 1 = $(ovs-vsctl find Interface type=geneve |
> >>>     grep -c _uuid)])
> >>>     +wait_row_count Encap 4
> >>>     +ovn-appctl vlog/set jsonrpc:dbg
> >>>     +as hv2
> >>>     +ovn-appctl vlog/set jsonrpc:dbg
> >>>     +
> >>>     +# Give hv2 the same encap IP as hv1 (.11) with other valid IPs.
> >>>     +check ovs-vsctl set Open_vSwitch . \
> >>>     +    external-ids:ovn-encap-
> >>>     ip="192.168.0.10,192.168.0.11,192.168.0.12,192.168.0.13"
> >>>     +
> >>>     +as hv1 wait_row_count Encap 2 chassis_name="hv1"
> >>>     +
> >>>     +# Make sure tunnels are created for the other three valid IPs
> >>>     configured.
> >>>     +as hv2
> >>>     +OVS_WAIT_UNTIL([test 3 = $(ovs-vsctl find Interface type=geneve |
> >>>     grep -c _uuid)])
> >>>     +wait_row_count Encap 6 chassis_name="hv2"
> >>>     +OVS_WAIT_UNTIL([check test 1 = $(cat hv2/ovn-controller.log |
> >>>     +                grep -c "'hv1' already has encap ip '192.168.0.11'
> >>>     and type "
> >>>     +                "'geneve', cannot duplicate on 'hv2'")])
> >
> > This check fails with different compiler flags.  Depending on the
> > order inside the hash map, we'll get either geneve or vxlan here.
>
> Again, either remove the type from the warning altogether, or we could
> match on only the beginning of the warning.
>
> >
> >>>
> >>>     +
> >>>     +# Make sure controller logs have correct number of transactions.
> >>>     +OVS_WAIT_UNTIL([check test 2 = $(cat hv2/ovn-controller.log | grep
> >>>     transact | wc -l)])
> >>>     +OVS_WAIT_UNTIL([check test 1 = $(cat hv1/ovn-controller.log | grep
> >>>     transact | wc -l)])
> >>>
> >>>
> >>> I don't think the transaction count is helping TBF. IMO we can
> >>> completely skip it.
> >
> > We can't really skipt it.  For exmaple, if you make the follwoing change
> in
> > the code on top of the suggested follow up:
> >
> > diff --git a/controller/chassis.c b/controller/chassis.c
> > index 4afb6da93..8168a9e82 100644
> > --- a/controller/chassis.c
> > +++ b/controller/chassis.c
> > @@ -737,12 +737,12 @@ chassis_build_encaps(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >
> >              /* Only if an Encap record does not already exist do we
> create a
> >               * new record. */
> > -            if (!encap_rec) {
> > +            //if (!encap_rec) {
> >                  encap_rec = sbrec_encap_insert(ovnsb_idl_txn);
> >                  sbrec_encap_set_type(encap_rec, encap_type);
> >                  sbrec_encap_set_ip(encap_rec, encap_ip);
> >                  sbrec_encap_set_chassis_name(encap_rec, chassis_id);
> > -            }
> > +            //}
> >
> >              if (!smap_is_empty(&encap_rec->options) ||
> >                  !smap_equal(&options, &encap_rec->options)) {
> > ---
> >
> > The test is still passing, but in the short time while the test is
> ruuning,
> > ovn-controller manages to send hundreds of transactions towards Sb,
> re-creating
> > all the encaps from scratch on every iteration.  Which is one of the
> issues this
> > patch is trying to fix, beside the transations sent towards OVS database.
>
> Ah yes, I was trying to figure out how to make the test pass wrongly and
> this is it.
>
> I thought removing the check for !ovnsb_idl_txn from encaps_run() would
> let it pass with extra transactions but it still failed at the check for
> # of tunnels.
>
> Yes we should keep the transaction count.
>
> >
> > Best regards, Ilya Maximets.
> >
>
> --
> Rosemarie O'Riorden
> Lowell, MA, United States
> [email protected]
>
>
Thank you Rosemarie and Ilya,

yeah that makes sense I have applied the diff, added back the transaction
check and made the warning check typeless, in the end it doesn't matter
what type of the tunnel it is. I went ahead and merged this into main and
backportred all the way down to 24.03.

Regards,
Ales
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to