On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote:
> 
> 
> On 22 Mar 2022, at 13:04, Tao Liu wrote:
> 
> > If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all tc
> > filters, those tcf_id related to the netdev will be left in ufid_to_tc
> > hashmap.
> > When qemu restart with a same vnet but different ifindex assigned,
> > a dup ufid may add. Especially after hashmap_expand, the old entry will
> > insert before the new one, delete or modify tc flow will always fail.
> >
> > So delete the stale entry to avoid a duplicated ufid in hashmap.
> 
> There are some comments below, as I still do not fully understand the fix.
> 
Please let me explain more.
As the scenery of qemu restart described above, filters are flushed by
kernel, but tcf_ids remain in ufid_to_tc hashmap.

Considering a ping running continuously from other host to VM, the first
reply from VM triggers upcall after qemu restart. A same ufid is generated
if the request processed on same handler. At this time, an old ufid exists
in hashmap, and flow del returns ENODEV(ifindex is wrong). However flow put
still succeeds because there is no filter in kernel. And a dup ufid adds in
hashmap.

After hashmap_expand, the old entry inserts before the new one.

If flow expires, revalidator fails to del flow bacause we always get the
old entry. There are constant warnings in ovs-vswitch.log:

    dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No such
    file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ...

In case of flow mod, flow del inside flow put still fails with ENODEV, and
tc_replace_flower also fails because old filter exists in kernel.

The idea of this patch is that, if a filter can't get from kernel, we can 
just delete the mapping, and no need to call del filter as it does not
exist. In this way, we can keep consistent between kernel and userspace.

> > Fixes: dd8ca104acd7 ("netdev-tc-offloads: Don't delete ufid mapping if fail 
> > to delete filter")))
> > Signed-off-by: Tao Liu <[email protected]>
> > ---
> >  lib/netdev-offload-tc.c | 37 ++++++++++++++++++++++---------------
> >  1 file changed, 22 insertions(+), 15 deletions(-)
> >
> > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> > index 12d0a9af3..e9dd66790 100644
> > --- a/lib/netdev-offload-tc.c
> > +++ b/lib/netdev-offload-tc.c
> > @@ -167,6 +167,9 @@ struct ufid_tc_data {
> >      struct netdev *netdev;
> >  };
> >
> > +static void parse_tc_flower_to_stats(struct tc_flower *flower,
> > +                                     struct dpif_flow_stats *stats);
> > +
> >  static void
> >  del_ufid_tc_mapping_unlocked(const ovs_u128 *ufid)
> >  {
> > @@ -200,11 +203,24 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
> >
> >  /* Wrapper function to delete filter and ufid tc mapping */
> >  static int
> > -del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid)
> > +del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid,
> > +                            struct dpif_flow_stats *stats)
> >  {
> > -    int err;
> > +    struct tc_flower flower;
> > +    int err = 0, get_err;
> > +
> > +    get_err = tc_get_flower(id, &flower);
> > +    if (!get_err) {
> 
> We should definitely add a comment here explaining the reasoning behind doing 
> the get first and then the delete!
> 
Will add it.

> However, it’s still not clear to me why this change does not undo, 
> dd8ca104acd7?
> 
> Because tc_get_flower() just finds out if the entry with the ID exists in the 
> kernel, and than only try to delete it.
> 
> For this can you just not check a specific return value in the 
> tc_del_filter() call?
> 
> I assume dd8ca104acd7 just want to skip removing the del_ufid_tc_mapping() if 
> the entry does NOT exist in the kernel?
> If not maybe Jianbo can explain a bit more the corner case, or maybe has a 
> test case for this?
> 
commit dd8ca104acd7 prevents the case of mapping deleted but filter exists
in kernel(del fails). This patch includes both caeses of the inconsistency.

> > +        err = tc_del_filter(id);
> > +    }
> > +
> > +    if (stats) {
> > +        memset(stats, 0, sizeof *stats);
> > +        if (!get_err) {
> > +            parse_tc_flower_to_stats(&flower, stats);
> > +        }
> > +    }
> 
> I guess this was moved here to avoid the additional tc_get_flower() call 
> below.
> 
Yes, it is.

> I would fold the stats part in the above if() case to avoid the need for the 
> get_err variable, i.e.
> 
>     if (!tc_get_flower(id, &flower)) {
>       err = tc_del_filter(id);
>       if (stats) {
>         memset(stats, 0, sizeof *stats);
>         if (!get_err) {
>             parse_tc_flower_to_stats(&flower, stats);
>         }
>       }
> 
We also need to zero stats if tc_get_flower fails.

> 
> >
> > -    err = tc_del_filter(id);
> >      if (!err) {
> >          del_ufid_tc_mapping(ufid);
> >      }
> > @@ -1946,7 +1962,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct 
> > match *match,
> >      if (get_ufid_tc_mapping(ufid, &id) == 0) {
> >          VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
> >                      id.handle, id.prio);
> > -        info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, 
> > ufid);
> > +        info->tc_modify_flow_deleted =
> > +                    !del_filter_and_ufid_mapping(&id, ufid, NULL);
> >      }
> >
> >      prio = get_prio_for_tc_flower(&flower);
> > @@ -2017,7 +2034,6 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
> >                     const ovs_u128 *ufid,
> >                     struct dpif_flow_stats *stats)
> >  {
> > -    struct tc_flower flower;
> >      struct tcf_id id;
> >      int error;
> >
> > @@ -2026,16 +2042,7 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
> >          return error;
> >      }
> >
> > -    if (stats) {
> > -        memset(stats, 0, sizeof *stats);
> > -        if (!tc_get_flower(&id, &flower)) {
> > -            parse_tc_flower_to_stats(&flower, stats);
> > -        }
> > -    }
> > -
> > -    error = del_filter_and_ufid_mapping(&id, ufid);
> > -
> > -    return error;
> > +    return del_filter_and_ufid_mapping(&id, ufid, stats);
> >  }
> >
> >  static int
> > -- 
> > 2.23.0
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 

--
Best regards,
Tao Liu
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to