On 25 Mar 2022, at 18:05, Tao Liu wrote:
> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote:
>>
>>
>> On 22 Mar 2022, at 13:04, Tao Liu wrote:
>>
>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all tc
>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc
>>> hashmap.
>>> When qemu restart with a same vnet but different ifindex assigned,
>>> a dup ufid may add. Especially after hashmap_expand, the old entry will
>>> insert before the new one, delete or modify tc flow will always fail.
>>>
>>> So delete the stale entry to avoid a duplicated ufid in hashmap.
>>
>> There are some comments below, as I still do not fully understand the fix.
>>
> Please let me explain more.
> As the scenery of qemu restart described above, filters are flushed by
> kernel, but tcf_ids remain in ufid_to_tc hashmap.
>
> Considering a ping running continuously from other host to VM, the first
> reply from VM triggers upcall after qemu restart. A same ufid is generated
> if the request processed on same handler. At this time, an old ufid exists
> in hashmap, and flow del returns ENODEV(ifindex is wrong).
Can this not be solved by modifying the previous fix to this?
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -198,7 +198,9 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const
ovs_u128 *ufid)
int err;
err = tc_del_filter(id);
- if (!err) {
+ if (!err || err = ENODEV) {
del_ufid_tc_mapping(ufid);
}
return err;
}
> However flow put
> still succeeds because there is no filter in kernel. And a dup ufid adds in
> hashmap.
>
> After hashmap_expand, the old entry inserts before the new one.
>
> If flow expires, revalidator fails to del flow bacause we always get the
> old entry. There are constant warnings in ovs-vswitch.log:
>
> dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No such
> file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ...
>
> In case of flow mod, flow del inside flow put still fails with ENODEV, and
> tc_replace_flower also fails because old filter exists in kernel.
>
> The idea of this patch is that, if a filter can't get from kernel, we can
> just delete the mapping, and no need to call del filter as it does not
> exist. In this way, we can keep consistent between kernel and userspace.
Thanks for the details, and I think the problem is a bit more complex.
As in theory duplicate ufid's could exist (guess even on the same port ;).
We could also fix this by including the interface in the lookup, in
get_ufid_tc_mapping()?
For example something like this:
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 12d0a9af3..8be3c5dc3 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -239,14 +239,15 @@ add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128
*ufid,
* Otherwise returns the error.
*/
static int
-get_ufid_tc_mapping(const ovs_u128 *ufid, struct tcf_id *id)
+get_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid,
+ struct tcf_id *id)
{
size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
struct ufid_tc_data *data;
ovs_mutex_lock(&ufid_lock);
HMAP_FOR_EACH_WITH_HASH (data, ufid_to_tc_node, ufid_hash, &ufid_to_tc) {
- if (ovs_u128_equals(*ufid, data->ufid)) {
+ if (netdev == data->netdev && ovs_u128_equals(*ufid, data->ufid) ) {
*id = data->id;
ovs_mutex_unlock(&ufid_lock);
return 0;
@@ -1943,7 +1944,7 @@ netdev_tc_flow_put(struct netdev *netdev, struct match
*match,
return EOPNOTSUPP;
}
- if (get_ufid_tc_mapping(ufid, &id) == 0) {
+ if (get_ufid_tc_mapping(netdev, ufid, &id) == 0) {
VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
id.handle, id.prio);
info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, ufid);
@@ -1986,7 +1987,7 @@ netdev_tc_flow_get(struct netdev *netdev,
struct tcf_id id;
int err;
- err = get_ufid_tc_mapping(ufid, &id);
+ err = get_ufid_tc_mapping(netdev, ufid, &id);
if (err) {
return err;
}
@@ -2013,7 +2014,7 @@ netdev_tc_flow_get(struct netdev *netdev,
}
static int
-netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
+netdev_tc_flow_del(struct netdev *netdev,
const ovs_u128 *ufid,
struct dpif_flow_stats *stats)
{
@@ -2021,7 +2022,7 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
struct tcf_id id;
int error;
- error = get_ufid_tc_mapping(ufid, &id);
+ error = get_ufid_tc_mapping(netdev, ufid, &id);
if (error) {
return error;
}
Hover this does not fix the root cause (same with your fix below). We need a
way to delete these stale mappings. Maybe we can watch some netlink messages,
or re-use the netdev layer, but I think we hold a reference in the mapping so
we won't delete either.
I think this is what we should try to fix, not working around the leftover
entries.
>>> Fixes: dd8ca104acd7 ("netdev-tc-offloads: Don't delete ufid mapping if fail
>>> to delete filter")))
>>> Signed-off-by: Tao Liu <[email protected]>
>>> ---
>>> lib/netdev-offload-tc.c | 37 ++++++++++++++++++++++---------------
>>> 1 file changed, 22 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index 12d0a9af3..e9dd66790 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -167,6 +167,9 @@ struct ufid_tc_data {
>>> struct netdev *netdev;
>>> };
>>>
>>> +static void parse_tc_flower_to_stats(struct tc_flower *flower,
>>> + struct dpif_flow_stats *stats);
>>> +
>>> static void
>>> del_ufid_tc_mapping_unlocked(const ovs_u128 *ufid)
>>> {
>>> @@ -200,11 +203,24 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
>>>
>>> /* Wrapper function to delete filter and ufid tc mapping */
>>> static int
>>> -del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid)
>>> +del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid,
>>> + struct dpif_flow_stats *stats)
>>> {
>>> - int err;
>>> + struct tc_flower flower;
>>> + int err = 0, get_err;
>>> +
>>> + get_err = tc_get_flower(id, &flower);
>>> + if (!get_err) {
>>
>> We should definitely add a comment here explaining the reasoning behind
>> doing the get first and then the delete!
>>
> Will add it.
>
>> However, it’s still not clear to me why this change does not undo,
>> dd8ca104acd7?
>>
>> Because tc_get_flower() just finds out if the entry with the ID exists in
>> the kernel, and than only try to delete it.
>>
>> For this can you just not check a specific return value in the
>> tc_del_filter() call?
>>
>> I assume dd8ca104acd7 just want to skip removing the del_ufid_tc_mapping()
>> if the entry does NOT exist in the kernel?
>> If not maybe Jianbo can explain a bit more the corner case, or maybe has a
>> test case for this?
>>
> commit dd8ca104acd7 prevents the case of mapping deleted but filter exists
> in kernel(del fails). This patch includes both caeses of the inconsistency.
>
>>> + err = tc_del_filter(id);
>>> + }
>>> +
>>> + if (stats) {
>>> + memset(stats, 0, sizeof *stats);
>>> + if (!get_err) {
>>> + parse_tc_flower_to_stats(&flower, stats);
>>> + }
>>> + }
>>
>> I guess this was moved here to avoid the additional tc_get_flower() call
>> below.
>>
> Yes, it is.
>
>> I would fold the stats part in the above if() case to avoid the need for the
>> get_err variable, i.e.
>>
>> if (!tc_get_flower(id, &flower)) {
>> err = tc_del_filter(id);
>> if (stats) {
>> memset(stats, 0, sizeof *stats);
>> if (!get_err) {
>> parse_tc_flower_to_stats(&flower, stats);
>> }
>> }
>>
> We also need to zero stats if tc_get_flower fails.
>
>>
>>>
>>> - err = tc_del_filter(id);
>>> if (!err) {
>>> del_ufid_tc_mapping(ufid);
>>> }
>>> @@ -1946,7 +1962,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct
>>> match *match,
>>> if (get_ufid_tc_mapping(ufid, &id) == 0) {
>>> VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
>>> id.handle, id.prio);
>>> - info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id,
>>> ufid);
>>> + info->tc_modify_flow_deleted =
>>> + !del_filter_and_ufid_mapping(&id, ufid, NULL);
>>> }
>>>
>>> prio = get_prio_for_tc_flower(&flower);
>>> @@ -2017,7 +2034,6 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
>>> const ovs_u128 *ufid,
>>> struct dpif_flow_stats *stats)
>>> {
>>> - struct tc_flower flower;
>>> struct tcf_id id;
>>> int error;
>>>
>>> @@ -2026,16 +2042,7 @@ netdev_tc_flow_del(struct netdev *netdev OVS_UNUSED,
>>> return error;
>>> }
>>>
>>> - if (stats) {
>>> - memset(stats, 0, sizeof *stats);
>>> - if (!tc_get_flower(&id, &flower)) {
>>> - parse_tc_flower_to_stats(&flower, stats);
>>> - }
>>> - }
>>> -
>>> - error = del_filter_and_ufid_mapping(&id, ufid);
>>> -
>>> - return error;
>>> + return del_filter_and_ufid_mapping(&id, ufid, stats);
>>> }
>>>
>>> static int
>>> --
>>> 2.23.0
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>
> --
> Best regards,
> Tao Liu
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev