On Tue, Mar 29, 2022 at 11:25:38AM +0200, Eelco Chaudron wrote:
> 
> 
> On 29 Mar 2022, at 10:37, Tao Liu wrote:
> 
> > On Mon, Mar 28, 2022 at 04:08:29PM +0200, Eelco Chaudron wrote:
> >> On Mon, Mar 28, 2022 at 3:15 PM Eelco Chaudron <[email protected]> wrote:
> >>
> >>>
> >>>
> >>> On 25 Mar 2022, at 18:05, Tao Liu wrote:
> >>>
> >>>> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote:
> >>>>>
> >>>>>
> >>>>> On 22 Mar 2022, at 13:04, Tao Liu wrote:
> >>>>>
> >>>>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all
> >>> tc
> >>>>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc
> >>>>>> hashmap.
> >>>>>> When qemu restart with a same vnet but different ifindex assigned,
> >>>>>> a dup ufid may add. Especially after hashmap_expand, the old entry will
> >>>>>> insert before the new one, delete or modify tc flow will always fail.
> >>>>>>
> >>>>>> So delete the stale entry to avoid a duplicated ufid in hashmap.
> >>>>>
> >>>>> There are some comments below, as I still do not fully understand the
> >>> fix.
> >>>>>
> >>>> Please let me explain more.
> >>>> As the scenery of qemu restart described above, filters are flushed by
> >>>> kernel, but tcf_ids remain in ufid_to_tc hashmap.
> >>>>
> >>>> Considering a ping running continuously from other host to VM, the first
> >>>> reply from VM triggers upcall after qemu restart. A same ufid is
> >>> generated
> >>>> if the request processed on same handler. At this time, an old ufid
> >>> exists
> >>>> in hashmap, and flow del returns ENODEV(ifindex is wrong).
> >>>
> >>> Can this not be solved by modifying the previous fix to this?
> >>>
> >>> --- a/lib/netdev-offload-tc.c
> >>> +++ b/lib/netdev-offload-tc.c
> >>> @@ -198,7 +198,9 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const
> >>> ovs_u128 *ufid)
> >>>      int err;
> >>>
> >>>      err = tc_del_filter(id);
> >>> -    if (!err) {
> >>> +    if (!err || err = ENODEV) {
> >>>         del_ufid_tc_mapping(ufid);
> >>>      }
> >>>      return err;
> >>>  }
> >>>
> >>>
> >>>> However flow put
> >>>> still succeeds because there is no filter in kernel. And a dup ufid adds
> >>> in
> >>>> hashmap.
> >>>>
> >>>> After hashmap_expand, the old entry inserts before the new one.
> >>>>
> >>>> If flow expires, revalidator fails to del flow bacause we always get the
> >>>> old entry. There are constant warnings in ovs-vswitch.log:
> >>>>
> >>>>     dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No
> >>> such
> >>>>     file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ...
> >>>>
> >>>> In case of flow mod, flow del inside flow put still fails with ENODEV,
> >>> and
> >>>> tc_replace_flower also fails because old filter exists in kernel.
> >>>>
> >>>> The idea of this patch is that, if a filter can't get from kernel, we can
> >>>> just delete the mapping, and no need to call del filter as it does not
> >>>> exist. In this way, we can keep consistent between kernel and userspace.
> >>>
> >>> Thanks for the details, and I think the problem is a bit more complex.
> >>>
> >>> As in theory duplicate ufid's could exist (guess even on the same port ;).
> >>> We could also fix this by including the interface in the lookup, in
> >>> get_ufid_tc_mapping()?
> >>>
> >>
> >> Not sure how OVS deals with the uniqueness of the ufid, if they assume its
> >> unique within the system, the fix can be simplified to something like this:
> >>
> >> --- a/lib/netdev-offload-tc.c
> >> +++ b/lib/netdev-offload-tc.c
> >>
> >> @@ -218,8 +222,21 @@ add_ufid_tc_mapping(struct netdev *netdev, const
> >> ovs_u128 *ufid,
> >>  {
> >>      struct ufid_tc_data *new_data = xzalloc(sizeof *new_data);
> >>      size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
> >> +    struct tcf_id existing_id;
> >>      size_t tc_hash;
> >>
> >> +    /* ADD COMMENT WHY ITS SAFE TO DELETE THIS. */
> >> +    if (!get_ufid_tc_mapping(ufid, &existing_id)) {
> >> +        del_ufid_tc_mapping(ufid);
> >> +    }
> >> +
> >>      tc_hash = hash_int(hash_int(id->prio, id->handle), id->ifindex);
> >>      tc_hash = hash_int(id->chain, tc_hash);
> >>
> >>
> > This fix works.
> > And I make a minor modification to avoid multiple hash computation.
> >
> > --- a/lib/netdev-offload-tc.c
> > +++ b/lib/netdev-offload-tc.c
> > @@ -216,6 +216,7 @@ static void
> >  add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid,
> >                      struct tcf_id *id)
> >  {
> > +    struct ufid_tc_data *data;
> >      struct ufid_tc_data *new_data = xzalloc(sizeof *new_data);
> >      size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
> >      size_t tc_hash;
> > @@ -228,6 +229,16 @@ add_ufid_tc_mapping(struct netdev *netdev, const 
> > ovs_u128 *ufid,
> >      new_data->netdev = netdev_ref(netdev);
> >
> >      ovs_mutex_lock(&ufid_lock);
> > +    /* Remove the stale ufid mapping to avoid a dup entry.
> > +     * ufid is unique in ovs which guaranteed by ukey_install__(), so it's
> > +     * safe to delete here. */
> > +    HMAP_FOR_EACH_WITH_HASH (data, ufid_to_tc_node, ufid_hash, 
> > &ufid_to_tc) {
> > +        if (ovs_u128_equals(*ufid, data->ufid)) {
> > +            del_ufid_tc_mapping_unlocked(ufid);
> > +            break;
> > +        }
> > +    }
> > +
> >      hmap_insert(&ufid_to_tc, &new_data->ufid_to_tc_node, ufid_hash);
> >      hmap_insert(&tc_to_ufid, &new_data->tc_to_ufid_node, tc_hash);
> >      ovs_mutex_unlock(&ufid_lock);
> >
> > But it's still a little confused, because we have called
> > get_ufid_tc_mapping() and del_filter_and_ufid_mapping() in flow put.
> > Watching netlink message sounds a good idea as you said.
> >
> 
> I agree doing this in adding the mapping sounds a bit confusing. We could 
> still do something like the below which is probably more in line, and safe’s 
> the extra lookup.
> 
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 411a9a007..d256e427c 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1961,9 +1961,15 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
> *match,
>      }
> 
>      if (get_ufid_tc_mapping(ufid, &id) == 0) {
> -        VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
> -                    id.handle, id.prio);
> -        info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, 
> ufid);
> +        if (id.ifindex != ifindex) {
> +            VLOG_DBG_RL(&rl, "remove stale handle: %d ifindex: %d",
> +                        id.handle, id.ifindex);
> +            del_ufid_tc_mapping_unlocked(ufid);
> +        } else {
> +            VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
> +                        id.handle, id.prio);
> +            info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, 
> ufid);
> +        }
>      }
Agree with you. This is my first version.

> 
> Also finding a nice way to watch for netlink messages might be the real 
> solution.
> 
Agree.

> <SNIP>
> 
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to