On 25 Mar 2022, at 18:05, Tao Liu wrote:
> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote:
>>
>>
>> On 22 Mar 2022, at 13:04, Tao Liu wrote:
>>
>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all tc
>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc
>>> hashmap.
>>> When qemu restart with a same vnet but different ifindex assigned,
>>> a dup ufid may add. Especially after hashmap_expand, the old entry will
>>> insert before the new one, delete or modify tc flow will always fail.
>>>
>>> So delete the stale entry to avoid a duplicated ufid in hashmap.
>>
>> There are some comments below, as I still do not fully understand the fix.
>>
> Please let me explain more.
> As the scenery of qemu restart described above, filters are flushed by
> kernel, but tcf_ids remain in ufid_to_tc hashmap.
>
> Considering a ping running continuously from other host to VM, the first
> reply from VM triggers upcall after qemu restart. A same ufid is generated
> if the request processed on same handler. At this time, an old ufid exists
> in hashmap, and flow del returns ENODEV(ifindex is wrong). However flow put
> still succeeds because there is no filter in kernel. And a dup ufid adds in
> hashmap.
>
> After hashmap_expand, the old entry inserts before the new one.
>
> If flow expires, revalidator fails to del flow bacause we always get the
> old entry. There are constant warnings in ovs-vswitch.log:
>
> dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No such
> file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ...
>
> In case of flow mod, flow del inside flow put still fails with ENODEV, and
> tc_replace_flower also fails because old filter exists in kernel.
>
> The idea of this patch is that, if a filter can't get from kernel, we can
> just delete the mapping, and no need to call del filter as it does not
> exist. In this way, we can keep consistent between kernel and userspace.
Thanks for the details, I’ll take another look next week, as it’s getting late
here :)
One small additional comment below…
>>> Fixes: dd8ca104acd7 ("netdev-tc-offloads: Don't delete ufid mapping if fail
>>> to delete filter")))
>>> Signed-off-by: Tao Liu <[email protected]>
>>> ---
>>> lib/netdev-offload-tc.c | 37 ++++++++++++++++++++++---------------
>>> 1 file changed, 22 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index 12d0a9af3..e9dd66790 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -167,6 +167,9 @@ struct ufid_tc_data {
>>> struct netdev *netdev;
>>> };
>>>
>>> +static void parse_tc_flower_to_stats(struct tc_flower *flower,
>>> + struct dpif_flow_stats *stats);
>>> +
>>> static void
>>> del_ufid_tc_mapping_unlocked(const ovs_u128 *ufid)
>>> {
>>> @@ -200,11 +203,24 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
>>>
>>> /* Wrapper function to delete filter and ufid tc mapping */
>>> static int
>>> -del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid)
>>> +del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid,
>>> + struct dpif_flow_stats *stats)
>>> {
>>> - int err;
>>> + struct tc_flower flower;
>>> + int err = 0, get_err;
>>> +
>>> + get_err = tc_get_flower(id, &flower);
>>> + if (!get_err) {
>>
>> We should definitely add a comment here explaining the reasoning behind
>> doing the get first and then the delete!
>>
> Will add it.
>
>> However, it’s still not clear to me why this change does not undo,
>> dd8ca104acd7?
>>
>> Because tc_get_flower() just finds out if the entry with the ID exists in
>> the kernel, and than only try to delete it.
>>
>> For this can you just not check a specific return value in the
>> tc_del_filter() call?
>>
>> I assume dd8ca104acd7 just want to skip removing the del_ufid_tc_mapping()
>> if the entry does NOT exist in the kernel?
>> If not maybe Jianbo can explain a bit more the corner case, or maybe has a
>> test case for this?
>>
> commit dd8ca104acd7 prevents the case of mapping deleted but filter exists
> in kernel(del fails). This patch includes both caeses of the inconsistency.
>
>>> + err = tc_del_filter(id);
>>> + }
>>> +
>>> + if (stats) {
>>> + memset(stats, 0, sizeof *stats);
>>> + if (!get_err) {
>>> + parse_tc_flower_to_stats(&flower, stats);
>>> + }
>>> + }
>>
>> I guess this was moved here to avoid the additional tc_get_flower() call
>> below.
>>
> Yes, it is.
>
>> I would fold the stats part in the above if() case to avoid the need for the
>> get_err variable, i.e.
>>
>> if (!tc_get_flower(id, &flower)) {
>> err = tc_del_filter(id);
>> if (stats) {
>> memset(stats, 0, sizeof *stats);
>> if (!get_err) {
>> parse_tc_flower_to_stats(&flower, stats);
>> }
>> }
>>
> We also need to zero stats if tc_get_flower fails.
If tc_get_flower() fails we can not clear the stats, as &flower might not be
valid here!
However this should not be a problem as the code you moved is also not clearing
the stats if tc_get_flower() fails:
- if (stats) {
- memset(stats, 0, sizeof *stats);
- if (!tc_get_flower(&id, &flower)) {
- parse_tc_flower_to_stats(&flower, stats);
- }
- }
>>
>>>
>>> - err = tc_del_filter(id);
>>> if (!err) {
>>> del_ufid_tc_mapping(ufid);
>>> }
>>> @@ -1946,7 +1962,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct
>>> match *match,
>>> if (get_ufid_tc_mapping(ufid, &id) == 0) {
>>> VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
>>> id.handle, id.prio);
>>> - info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id,
>>> ufid);
>>> + info->tc_modify_flow_deleted =
>>> + !del_filter_and_ufid_mapping(&id, ufid, NULL);
>>> }
>>>
>>> prio = get_prio_for_tc_flower(&flower);
>>> @@ -2017,7 +2034,6 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
>>> const ovs_u128 *ufid,
>>> struct dpif_flow_stats *stats)
>>> {
>>> - struct tc_flower flower;
>>> struct tcf_id id;
>>> int error;
>>>
>>> @@ -2026,16 +2042,7 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
>>> return error;
>>> }
>>>
>>> - if (stats) {
>>> - memset(stats, 0, sizeof *stats);
>>> - if (!tc_get_flower(&id, &flower)) {
>>> - parse_tc_flower_to_stats(&flower, stats);
>>> - }
>>> - }
>>> -
>>> - error = del_filter_and_ufid_mapping(&id, ufid);
>>> -
>>> - return error;
>>> + return del_filter_and_ufid_mapping(&id, ufid, stats);
>>> }
>>>
>>> static int
>>> --
>>> 2.23.0
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>
> --
> Best regards,
> Tao Liu
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev