Hello, Roi,

Thank you for your prompt reply, see response in-line below.

On Thu, May 11, 2023 at 12:51 PM Roi Dayan <[email protected]> wrote:
> On 11/05/2023 13:03, Frode Nordahl wrote:
> > The tc module combines the use of the `tc_transact` helper
> > function for communication with the in-kernel tc infrastructure
> > with assertions on the reply data by `ofpbuf_at_assert` on the
> > received data prior to further processing.
> >
> > `tc_transact` in turn calls `nl_transact`, which via
> > `nl_transact_multiple__` ultimately calls and handles return
> > value from `recvmsg`.  On error a check for EAGAIN is performed
> > and a consequence of this condition is effectively to provide a
> > non-error (0) result and an empty reply buffer.
> >
> > Before this change, the `tc_transact` and, consumers of it, were
> > unaware of this condition.  The use of assertions on the reply
> > buffer can as such lead to a fatal crash of OVS.
> >
> > To be fair, the behavior of `nl_transact` when handling an EAGAIN
> > return is currently not documented, so this change also adds that
> > documentation.
> >
> > While fixing the problem, it led me to find potential problems
> > with several of the one-time initialization functions in
> > the netdev-offload-tc module.  Before this patch they would
> > happily accept temporary failure as permanent one, with
> > consequences for the remaining lifetime of the process.
>
> Good catch on EAGAIN returning empty buffer.
> I have a question about the change to loop EAGAIN in netdev-offload-tc calls.
> Why not to keep treat EAGAIN as an error? the revalidator will call
> again if needed.
> In current situation what will happen if kernel will keep giving EAGAIN?
> why to keep ovs in this loop and not resume normal operation until next 
> revalidation?
> This is so OVS could keep do other tasks as needed.

The change to loop was only made in one-time initialization functions
(i.e. `probe_multi_mask_per_prio`, `probe_insert_ct_state_rule`,
 `probe_ct_state_support`, `probe_tc_block_support`).

AFAICT, these are all ultimately called from within a one time
initialization block, i.e:

    if (ovsthread_once_start(&once)) {}

Which means they are only attempted once per ovs-vswitchd process invocation.

Or am I missing something?

-- 
Frode Nordahl

> >
> > For the record, the symptom of the crash is this in the log:
> > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= 
> > b->size failed in ofpbuf_at_assert()
> >
> > And an excerpt of the backtrace looks like this:
> > 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, 
> > offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
> > tc_replace_flower (id=<optimized out>, flower=<optimized out>) at 
> > ../lib/tc.c:3223
> > 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, 
> > match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>,
> > ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at 
> > ../lib/netdev-offload-tc.c:2096
> > 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, 
> > info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, 
> > actions=<optimized out>,
> > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
> > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at 
> > ../lib/dpif-netlink.c:2297
> > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at 
> > ../lib/dpif-netlink.c:2384
> >
> > Reported-At: https://launchpad.net/bugs/2018500
> > Fixes f98e418fbdb6 (tc: Add tc flower functions)
> > Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message 
> > in a transaction.)
> > Signed-off-by: Frode Nordahl <[email protected]>
> > ---
> >  lib/dpif-netlink.c      |  4 +++-
> >  lib/dpif.c              |  2 +-
> >  lib/netdev-linux.c      | 40 ++++++++++++++++++++++++++--------------
> >  lib/netdev-offload-tc.c | 31 ++++++++++++++++++++++---------
> >  lib/netlink-socket.c    |  5 +++++
> >  lib/tc.c                | 18 ++++++++++++++++++
> >  6 files changed, 75 insertions(+), 25 deletions(-)
> >
> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> > index 60bd39643..75537b2f8 100644
> > --- a/lib/dpif-netlink.c
> > +++ b/lib/dpif-netlink.c
> > @@ -2340,7 +2340,9 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
> > dpif_flow_put *put)
> >              }
> >              netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true);
> >          }
> > -        level = (err == ENOSPC || err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR;
> > +        level = (err == EAGAIN ||
> > +                 err == ENOSPC ||
> > +                 err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR;
> >          VLOG_RL(&rl, level, "failed to offload flow: %s: %s",
> >                  ovs_strerror(err),
> >                  (oor_netdev ? oor_netdev->name : dev->name));
> > diff --git a/lib/dpif.c b/lib/dpif.c
> > index 3305401fe..2bbb0aac4 100644
> > --- a/lib/dpif.c
> > +++ b/lib/dpif.c
> > @@ -1751,7 +1751,7 @@ flow_message_log_level(int error)
> >       * Kernels that support flow wildcarding will reject these flows as
> >       * duplicates (EEXIST), so lower the log level to debug for these
> >       * types of messages. */
> > -    return (error && error != EEXIST) ? VLL_WARN : VLL_DBG;
> > +    return (error && error != EEXIST && error != EAGAIN) ? VLL_WARN : 
> > VLL_DBG;
> >  }
> >
> >  static bool
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index 36620199e..a6065bf69 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -5817,8 +5817,11 @@ tc_get_policer_action(uint32_t index, struct 
> > ofputil_meter_stats *stats)
> >
> >      error = tc_transact(&request, &replyp);
> >      if (error) {
> > -        VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), 
> > err=%d",
> > -                    index, error);
> > +        VLOG_RL(&rl,
> > +                (error == EAGAIN) ? VLL_DBG : VLL_ERR,
> > +                "Failed to dump police action (index: %u), err=%d",
> > +                index,
> > +                error);
> >          return error;
> >      }
> >
> > @@ -5849,8 +5852,11 @@ tc_del_policer_action(uint32_t index, struct 
> > ofputil_meter_stats *stats)
> >
> >      error = tc_transact(&request, &replyp);
> >      if (error) {
> > -        VLOG_ERR_RL(&rl, "Failed to delete police action (index: %u), 
> > err=%d",
> > -                    index, error);
> > +        VLOG_RL(&rl,
> > +                (error == EAGAIN) ? VLL_DBG : VLL_ERR,
> > +                "Failed to delete police action (index: %u), err=%d",
> > +                index,
> > +                error);
> >          return error;
> >      }
> >
> > @@ -6114,11 +6120,13 @@ tc_query_class(const struct netdev *netdev,
> >
> >      error = tc_transact(&request, replyp);
> >      if (error) {
> > -        VLOG_WARN_RL(&rl, "query %s class %u:%u (parent %u:%u) failed 
> > (%s)",
> > -                     netdev_get_name(netdev),
> > -                     tc_get_major(handle), tc_get_minor(handle),
> > -                     tc_get_major(parent), tc_get_minor(parent),
> > -                     ovs_strerror(error));
> > +        VLOG_RL(&rl,
> > +                (error == EAGAIN) ? VLL_DBG : VLL_WARN,
> > +                "query %s class %u:%u (parent %u:%u) failed (%s)",
> > +                netdev_get_name(netdev),
> > +                tc_get_major(handle), tc_get_minor(handle),
> > +                tc_get_major(parent), tc_get_minor(parent),
> > +                ovs_strerror(error));
> >      }
> >      return error;
> >  }
> > @@ -6140,10 +6148,12 @@ tc_delete_class(const struct netdev *netdev, 
> > unsigned int handle)
> >
> >      error = tc_transact(&request, NULL);
> >      if (error) {
> > -        VLOG_WARN_RL(&rl, "delete %s class %u:%u failed (%s)",
> > -                     netdev_get_name(netdev),
> > -                     tc_get_major(handle), tc_get_minor(handle),
> > -                     ovs_strerror(error));
> > +        VLOG_RL(&rl,
> > +                (error == EAGAIN) ? VLL_DBG : VLL_WARN,
> > +                "delete %s class %u:%u failed (%s)",
> > +                netdev_get_name(netdev),
> > +                tc_get_major(handle), tc_get_minor(handle),
> > +                ovs_strerror(error));
> >      }
> >      return error;
> >  }
> > @@ -6262,7 +6272,9 @@ tc_query_qdisc(const struct netdev *netdev_)
> >                  ops = &tc_ops_other;
> >              }
> >          }
> > -    } else if ((!error && !qdisc->size) || error == ENOENT) {
> > +    } else if ((!error && !qdisc->size) ||
> > +               error == ENOENT || error == EAGAIN)
> > +    {
> >          /* Either it's a built-in qdisc, or (on Linux pre-2.6.35) it's a 
> > qdisc
> >           * set up by some other entity that doesn't have a handle 1:0.  We 
> > will
> >           * assume that it's the system default qdisc. */
> > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> > index 4f26dd8cc..4c15989d6 100644
> > --- a/lib/netdev-offload-tc.c
> > +++ b/lib/netdev-offload-tc.c
> > @@ -2505,10 +2505,11 @@ netdev_tc_flow_get(struct netdev *netdev,
> >
> >      err = tc_get_flower(&id, &flower);
> >      if (err) {
> > -        VLOG_ERR_RL(&error_rl,
> > -                    "flow get failed (dev %s prio %d handle %d): %s",
> > -                    netdev_get_name(netdev), id.prio, id.handle,
> > -                    ovs_strerror(err));
> > +        VLOG_RL(&error_rl,
> > +                (err == EAGAIN) ? VLL_DBG : VLL_ERR,
> > +                "flow get failed (dev %s prio %d handle %d): %s",
> > +                netdev_get_name(netdev), id.prio, id.handle,
> > +                ovs_strerror(err));
> >          return err;
> >      }
> >
> > @@ -2594,7 +2595,9 @@ probe_multi_mask_per_prio(int ifindex)
> >      memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac);
> >
> >      id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS);
> > -    error = tc_replace_flower(&id1, &flower);
> > +    do {
> > +        error = tc_replace_flower(&id1, &flower);
> > +    } while (error == EAGAIN);
> >      if (error) {
> >          goto out;
> >      }
> > @@ -2603,7 +2606,9 @@ probe_multi_mask_per_prio(int ifindex)
> >      memset(&flower.mask.src_mac, 0xff, sizeof flower.mask.src_mac);
> >
> >      id2 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS);
> > -    error = tc_replace_flower(&id2, &flower);
> > +    do {
> > +        error = tc_replace_flower(&id2, &flower);
> > +    } while (error == EAGAIN);
> >      tc_del_flower_filter(&id1);
> >
> >      if (error) {
> > @@ -2625,6 +2630,7 @@ probe_insert_ct_state_rule(int ifindex, uint16_t 
> > ct_state, struct tcf_id *id)
> >  {
> >      int prio = TC_RESERVED_PRIORITY_MAX + 1;
> >      struct tc_flower flower;
> > +    int error;
> >
> >      memset(&flower, 0, sizeof flower);
> >      flower.key.ct_state = ct_state;
> > @@ -2634,7 +2640,10 @@ probe_insert_ct_state_rule(int ifindex, uint16_t 
> > ct_state, struct tcf_id *id)
> >      flower.mask.eth_type = OVS_BE16_MAX;
> >
> >      *id = tc_make_tcf_id(ifindex, 0, prio, TC_INGRESS);
> > -    return tc_replace_flower(id, &flower);
> > +    do {
> > +        error = tc_replace_flower(id, &flower);
> > +    } while (error == EAGAIN);
> > +    return error;
> >  }
> >
> >  static void
> > @@ -2657,7 +2666,9 @@ probe_ct_state_support(int ifindex)
> >          goto out;
> >      }
> >
> > -    error = tc_get_flower(&id, &flower);
> > +    do {
> > +        error = tc_get_flower(&id, &flower);
> > +    } while (error == EAGAIN);
> >      if (error || flower.mask.ct_state != ct_state) {
> >          goto out_del;
> >      }
> > @@ -2732,7 +2743,9 @@ probe_tc_block_support(int ifindex)
> >      memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac);
> >
> >      id = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS);
> > -    error = tc_replace_flower(&id, &flower);
> > +    do {
> > +        error = tc_replace_flower(&id, &flower);
> > +    } while (error == EAGAIN);
> >
> >      tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS);
> >
> > diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
> > index 80da20d9f..dea060fc3 100644
> > --- a/lib/netlink-socket.c
> > +++ b/lib/netlink-socket.c
> > @@ -1798,6 +1798,11 @@ nl_pool_release(struct nl_sock *sock)
> >   *
> >   *      2. Resending the request causes it to be re-executed, so the 
> > request
> >   *         needs to be idempotent.
> > + *
> > + *      3. In the event that the kernel is too busy to handle the request 
> > to
> > + *         receive the response (i.e. EAGAIN), this function will still 
> > return
> > + *         0.  The caller can check for this condition by checking for a 
> > zero
> > + *         size of the 'replyp' ofpbuf buffer.
> >   */
> >  int
> >  nl_transact(int protocol, const struct ofpbuf *request,
> > diff --git a/lib/tc.c b/lib/tc.c
> > index 5c32c6f97..7df6a10c5 100644
> > --- a/lib/tc.c
> > +++ b/lib/tc.c
> > @@ -237,11 +237,29 @@ static void request_from_tcf_id(struct tcf_id *id, 
> > uint16_t eth_type,
> >      }
> >  }
> >
> > +/* The `tc_transact` function is a wrapper around `nl_transact` with the
> > + * addition of:
> > + *
> > + * 1. the 'request' ofpbuf buffer is freed after `nl_transact` returns,
> > + *    regardless of success or failure.
> > + *
> > + * 2. When a 'replyp' pointer is provided; in the event of the kernel
> > + *    being too busy to process the request for the response, a positive
> > + *    error return will be provided with the value of EAGAIN.
> > + *
> > + * Please acquaint yourself with the documentation of the `nl_transact`
> > + * function in the netlink-sock module before making use of this function.
> > + */
> >  int
> >  tc_transact(struct ofpbuf *request, struct ofpbuf **replyp)
> >  {
> >      int error = nl_transact(NETLINK_ROUTE, request, replyp);
> >      ofpbuf_uninit(request);
> > +
> > +    if (!error && replyp && !(*replyp)->size) {
> > +        return EAGAIN;
> > +    }
> > +
> >      return error;
> >  }
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to