On Thu, May 25, 2023 at 10:37 AM Dumitru Ceara <[email protected]> wrote:

> Whenever an OpenFlow error is returned by OvS, trigger a reconnect of
> the OpenFlow (rconn) connection.  This will clear any installed OpenFlow
> rules/groups. To ensure consistency, trigger a full I-P recompute too.
>
> An example of scenario that can result in an OpenFlow error returned by
> OvS follows (describing two main loop iterations in ovn-controller):
>   - Iteration I:
>     a. get updates from SB
>     b. process these updates and generate "desired" openflows (lets assume
>     this generates quite a lot of desired openflow modifications)
>     c.1. add bundle-open msg to rconn
>     c.2. add openflow mod msgs to rconn (only some of these make it
> through,
>     the rest gets queued, the rconn is backlogged at this point).
>     c.3. add bundle-commit msg to rconn (this gets queued)
>
>   - Iteration II:
>     a. get updates from SB (rconn is still backlogged)
>     b. process the updates and generate "desired" openflows (lets assume
>     this takes 10+ seconds for the specific SB updates)
>
> At some point, while step II.b was being executed OvS declared the bundle
> operation (started at I.c.1) timeout.  We now act on this error by
> reconnecting which in turn triggers a flush of the rconn backlog and
> gives more chance to the next full recompute to succeed in installing
> all flows.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134880
> Reported-by: François Rigault <[email protected]>
> CC: Ilya Maximets <[email protected]>
> Signed-off-by: Dumitru Ceara <[email protected]>
>

Hi Dumitru,

sorry for being nitpicky, please see my comment below.


> ---
>  controller/ofctrl.c         | 17 ++++++++++++++---
>  controller/ofctrl.h         |  2 +-
>  controller/ovn-controller.c | 10 ++++++++--
>  3 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index b1ba1c743a..1da23bc27e 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -766,13 +766,18 @@ ofctrl_get_mf_field_id(void)
>
>  /* Runs the OpenFlow state machine against 'br_int', which is local to the
>   * hypervisor on which we are running.  Attempts to negotiate a Geneve
> option
> - * field for class OVN_GENEVE_CLASS, type OVN_GENEVE_TYPE. */
> -void
> + * field for class OVN_GENEVE_CLASS, type OVN_GENEVE_TYPE.
> + *
> + * Returns 'true' if an OpenFlow reconnect happened; 'false' otherwise.
> + */
> +bool
>  ofctrl_run(const struct ovsrec_bridge *br_int,
>             const struct ovsrec_open_vswitch_table *ovs_table,
>             struct shash *pending_ct_zones)
>  {
>      char *target = xasprintf("unix:%s/%s.mgmt", ovs_rundir(),
> br_int->name);
> +    bool reconnected = false;
> +
>      if (strcmp(target, rconn_get_target(swconn))) {
>          VLOG_INFO("%s: connecting to switch", target);
>          rconn_connect(swconn, target, target);
> @@ -782,10 +787,12 @@ ofctrl_run(const struct ovsrec_bridge *br_int,
>      rconn_run(swconn);
>
>      if (!rconn_is_connected(swconn)) {
> -        return;
> +        goto done;
>

We don't have to introduce the "done" label. AFIACT you can just return
the "reconnected" right here with the same effect.


>      }
> +
>      if (seqno != rconn_get_connection_seqno(swconn)) {
>          seqno = rconn_get_connection_seqno(swconn);
> +        reconnected = true;
>          state = S_NEW;
>
>          /* Reset the state of any outstanding ct flushes to resend them.
> */
> @@ -855,6 +862,9 @@ ofctrl_run(const struct ovsrec_bridge *br_int,
>           * point, so ensure that we come back again without waiting. */
>          poll_immediate_wake();
>      }
> +
> +done:
> +    return reconnected;
>  }
>
>  void
> @@ -909,6 +919,7 @@ ofctrl_recv(const struct ofp_header *oh, enum ofptype
> type)
>      } else if (type == OFPTYPE_ERROR) {
>          static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(30, 300);
>          log_openflow_rl(&rl, VLL_INFO, oh, "OpenFlow error");
> +        rconn_reconnect(swconn);
>      } else {
>          static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(30, 300);
>          log_openflow_rl(&rl, VLL_DBG, oh, "OpenFlow packet ignored");
> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> index f5751e3ee4..105f9370be 100644
> --- a/controller/ofctrl.h
> +++ b/controller/ofctrl.h
> @@ -51,7 +51,7 @@ struct ovn_desired_flow_table {
>  void ofctrl_init(struct ovn_extend_table *group_table,
>                   struct ovn_extend_table *meter_table,
>                   int inactivity_probe_interval);
> -void ofctrl_run(const struct ovsrec_bridge *br_int,
> +bool ofctrl_run(const struct ovsrec_bridge *br_int,
>                  const struct ovsrec_open_vswitch_table *,
>                  struct shash *pending_ct_zones);
>  enum mf_field_id ofctrl_get_mf_field_id(void);
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 1151d36644..b301c50157 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -5075,8 +5075,14 @@ main(int argc, char *argv[])
>
>              if (br_int) {
>                  ct_zones_data = engine_get_data(&en_ct_zones);
> -                if (ct_zones_data) {
> -                    ofctrl_run(br_int, ovs_table,
> &ct_zones_data->pending);
> +                if (ct_zones_data && ofctrl_run(br_int, ovs_table,
> +                                                &ct_zones_data->pending))
> {
> +                    static struct vlog_rate_limit rl
> +                            = VLOG_RATE_LIMIT_INIT(1, 1);
> +
> +                    VLOG_INFO_RL(&rl, "OVS OpenFlow connection
> reconnected,"
> +                                      "force recompute.");
> +                    engine_set_force_recompute(true);
>                  }
>
>                  if (chassis) {
> --
> 2.31.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


I suppose the comment can be addressed during merge, with that being
said:

Reviewed-by: Ales Musil <[email protected]>

Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected]    IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to