On 25 Mar 2022, at 18:05, Tao Liu wrote:

> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote:
>>
>>
>> On 22 Mar 2022, at 13:04, Tao Liu wrote:
>>
>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all tc
>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc
>>> hashmap.
>>> When qemu restart with a same vnet but different ifindex assigned,
>>> a dup ufid may add. Especially after hashmap_expand, the old entry will
>>> insert before the new one, delete or modify tc flow will always fail.
>>>
>>> So delete the stale entry to avoid a duplicated ufid in hashmap.
>>
>> There are some comments below, as I still do not fully understand the fix.
>>
> Please let me explain more.
> As the scenery of qemu restart described above, filters are flushed by
> kernel, but tcf_ids remain in ufid_to_tc hashmap.
>
> Considering a ping running continuously from other host to VM, the first
> reply from VM triggers upcall after qemu restart. A same ufid is generated
> if the request processed on same handler. At this time, an old ufid exists
> in hashmap, and flow del returns ENODEV(ifindex is wrong). However flow put
> still succeeds because there is no filter in kernel. And a dup ufid adds in
> hashmap.
>
> After hashmap_expand, the old entry inserts before the new one.
>
> If flow expires, revalidator fails to del flow bacause we always get the
> old entry. There are constant warnings in ovs-vswitch.log:
>
>     dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No such
>     file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ...
>
> In case of flow mod, flow del inside flow put still fails with ENODEV, and
> tc_replace_flower also fails because old filter exists in kernel.
>
> The idea of this patch is that, if a filter can't get from kernel, we can
> just delete the mapping, and no need to call del filter as it does not
> exist. In this way, we can keep consistent between kernel and userspace.

Thanks for the details, I’ll take another look next week, as it’s getting late 
here :)
One small additional comment below…

>>> Fixes: dd8ca104acd7 ("netdev-tc-offloads: Don't delete ufid mapping if fail 
>>> to delete filter")))
>>> Signed-off-by: Tao Liu <[email protected]>
>>> ---
>>>  lib/netdev-offload-tc.c | 37 ++++++++++++++++++++++---------------
>>>  1 file changed, 22 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index 12d0a9af3..e9dd66790 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -167,6 +167,9 @@ struct ufid_tc_data {
>>>      struct netdev *netdev;
>>>  };
>>>
>>> +static void parse_tc_flower_to_stats(struct tc_flower *flower,
>>> +                                     struct dpif_flow_stats *stats);
>>> +
>>>  static void
>>>  del_ufid_tc_mapping_unlocked(const ovs_u128 *ufid)
>>>  {
>>> @@ -200,11 +203,24 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
>>>
>>>  /* Wrapper function to delete filter and ufid tc mapping */
>>>  static int
>>> -del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid)
>>> +del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid,
>>> +                            struct dpif_flow_stats *stats)
>>>  {
>>> -    int err;
>>> +    struct tc_flower flower;
>>> +    int err = 0, get_err;
>>> +
>>> +    get_err = tc_get_flower(id, &flower);
>>> +    if (!get_err) {
>>
>> We should definitely add a comment here explaining the reasoning behind 
>> doing the get first and then the delete!
>>
> Will add it.
>
>> However, it’s still not clear to me why this change does not undo, 
>> dd8ca104acd7?
>>
>> Because tc_get_flower() just finds out if the entry with the ID exists in 
>> the kernel, and than only try to delete it.
>>
>> For this can you just not check a specific return value in the 
>> tc_del_filter() call?
>>
>> I assume dd8ca104acd7 just want to skip removing the del_ufid_tc_mapping() 
>> if the entry does NOT exist in the kernel?
>> If not maybe Jianbo can explain a bit more the corner case, or maybe has a 
>> test case for this?
>>
> commit dd8ca104acd7 prevents the case of mapping deleted but filter exists
> in kernel(del fails). This patch includes both caeses of the inconsistency.
>
>>> +        err = tc_del_filter(id);
>>> +    }
>>> +
>>> +    if (stats) {
>>> +        memset(stats, 0, sizeof *stats);
>>> +        if (!get_err) {
>>> +            parse_tc_flower_to_stats(&flower, stats);
>>> +        }
>>> +    }
>>
>> I guess this was moved here to avoid the additional tc_get_flower() call 
>> below.
>>
> Yes, it is.
>
>> I would fold the stats part in the above if() case to avoid the need for the 
>> get_err variable, i.e.
>>
>>     if (!tc_get_flower(id, &flower)) {
>>       err = tc_del_filter(id);
>>       if (stats) {
>>         memset(stats, 0, sizeof *stats);
>>         if (!get_err) {
>>             parse_tc_flower_to_stats(&flower, stats);
>>         }
>>       }
>>
> We also need to zero stats if tc_get_flower fails.

If tc_get_flower() fails we can not clear the stats, as &flower might not be 
valid here!
However this should not be a problem as the code you moved is also not clearing 
the stats if tc_get_flower() fails:

 -    if (stats) {
 -        memset(stats, 0, sizeof *stats);
 -        if (!tc_get_flower(&id, &flower)) {
 -            parse_tc_flower_to_stats(&flower, stats);
 -        }
 -    }

>>
>>>
>>> -    err = tc_del_filter(id);
>>>      if (!err) {
>>>          del_ufid_tc_mapping(ufid);
>>>      }
>>> @@ -1946,7 +1962,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct 
>>> match *match,
>>>      if (get_ufid_tc_mapping(ufid, &id) == 0) {
>>>          VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
>>>                      id.handle, id.prio);
>>> -        info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, 
>>> ufid);
>>> +        info->tc_modify_flow_deleted =
>>> +                    !del_filter_and_ufid_mapping(&id, ufid, NULL);
>>>      }
>>>
>>>      prio = get_prio_for_tc_flower(&flower);
>>> @@ -2017,7 +2034,6 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
>>>                     const ovs_u128 *ufid,
>>>                     struct dpif_flow_stats *stats)
>>>  {
>>> -    struct tc_flower flower;
>>>      struct tcf_id id;
>>>      int error;
>>>
>>> @@ -2026,16 +2042,7 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
>>>          return error;
>>>      }
>>>
>>> -    if (stats) {
>>> -        memset(stats, 0, sizeof *stats);
>>> -        if (!tc_get_flower(&id, &flower)) {
>>> -            parse_tc_flower_to_stats(&flower, stats);
>>> -        }
>>> -    }
>>> -
>>> -    error = del_filter_and_ufid_mapping(&id, ufid);
>>> -
>>> -    return error;
>>> +    return del_filter_and_ufid_mapping(&id, ufid, stats);
>>>  }
>>>
>>>  static int
>>> -- 
>>> 2.23.0
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>
> --
> Best regards,
> Tao Liu

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to