On 29 Mar 2022, at 10:37, Tao Liu wrote:

> On Mon, Mar 28, 2022 at 04:08:29PM +0200, Eelco Chaudron wrote:
>> On Mon, Mar 28, 2022 at 3:15 PM Eelco Chaudron <[email protected]> wrote:
>>
>>>
>>>
>>> On 25 Mar 2022, at 18:05, Tao Liu wrote:
>>>
>>>> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 22 Mar 2022, at 13:04, Tao Liu wrote:
>>>>>
>>>>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all
>>> tc
>>>>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc
>>>>>> hashmap.
>>>>>> When qemu restart with a same vnet but different ifindex assigned,
>>>>>> a dup ufid may add. Especially after hashmap_expand, the old entry will
>>>>>> insert before the new one, delete or modify tc flow will always fail.
>>>>>>
>>>>>> So delete the stale entry to avoid a duplicated ufid in hashmap.
>>>>>
>>>>> There are some comments below, as I still do not fully understand the
>>> fix.
>>>>>
>>>> Please let me explain more.
>>>> As the scenery of qemu restart described above, filters are flushed by
>>>> kernel, but tcf_ids remain in ufid_to_tc hashmap.
>>>>
>>>> Considering a ping running continuously from other host to VM, the first
>>>> reply from VM triggers upcall after qemu restart. A same ufid is
>>> generated
>>>> if the request processed on same handler. At this time, an old ufid
>>> exists
>>>> in hashmap, and flow del returns ENODEV(ifindex is wrong).
>>>
>>> Can this not be solved by modifying the previous fix to this?
>>>
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -198,7 +198,9 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const
>>> ovs_u128 *ufid)
>>>      int err;
>>>
>>>      err = tc_del_filter(id);
>>> -    if (!err) {
>>> +    if (!err || err = ENODEV) {
>>>         del_ufid_tc_mapping(ufid);
>>>      }
>>>      return err;
>>>  }
>>>
>>>
>>>> However flow put
>>>> still succeeds because there is no filter in kernel. And a dup ufid adds
>>> in
>>>> hashmap.
>>>>
>>>> After hashmap_expand, the old entry inserts before the new one.
>>>>
>>>> If flow expires, revalidator fails to del flow bacause we always get the
>>>> old entry. There are constant warnings in ovs-vswitch.log:
>>>>
>>>>     dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No
>>> such
>>>>     file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ...
>>>>
>>>> In case of flow mod, flow del inside flow put still fails with ENODEV,
>>> and
>>>> tc_replace_flower also fails because old filter exists in kernel.
>>>>
>>>> The idea of this patch is that, if a filter can't get from kernel, we can
>>>> just delete the mapping, and no need to call del filter as it does not
>>>> exist. In this way, we can keep consistent between kernel and userspace.
>>>
>>> Thanks for the details, and I think the problem is a bit more complex.
>>>
>>> As in theory duplicate ufid's could exist (guess even on the same port ;).
>>> We could also fix this by including the interface in the lookup, in
>>> get_ufid_tc_mapping()?
>>>
>>
>> Not sure how OVS deals with the uniqueness of the ufid, if they assume its
>> unique within the system, the fix can be simplified to something like this:
>>
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>>
>> @@ -218,8 +222,21 @@ add_ufid_tc_mapping(struct netdev *netdev, const
>> ovs_u128 *ufid,
>>  {
>>      struct ufid_tc_data *new_data = xzalloc(sizeof *new_data);
>>      size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
>> +    struct tcf_id existing_id;
>>      size_t tc_hash;
>>
>> +    /* ADD COMMENT WHY ITS SAFE TO DELETE THIS. */
>> +    if (!get_ufid_tc_mapping(ufid, &existing_id)) {
>> +        del_ufid_tc_mapping(ufid);
>> +    }
>> +
>>      tc_hash = hash_int(hash_int(id->prio, id->handle), id->ifindex);
>>      tc_hash = hash_int(id->chain, tc_hash);
>>
>>
> This fix works.
> And I make a minor modification to avoid multiple hash computation.
>
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -216,6 +216,7 @@ static void
>  add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid,
>                      struct tcf_id *id)
>  {
> +    struct ufid_tc_data *data;
>      struct ufid_tc_data *new_data = xzalloc(sizeof *new_data);
>      size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
>      size_t tc_hash;
> @@ -228,6 +229,16 @@ add_ufid_tc_mapping(struct netdev *netdev, const 
> ovs_u128 *ufid,
>      new_data->netdev = netdev_ref(netdev);
>
>      ovs_mutex_lock(&ufid_lock);
> +    /* Remove the stale ufid mapping to avoid a dup entry.
> +     * ufid is unique in ovs which guaranteed by ukey_install__(), so it's
> +     * safe to delete here. */
> +    HMAP_FOR_EACH_WITH_HASH (data, ufid_to_tc_node, ufid_hash, &ufid_to_tc) {
> +        if (ovs_u128_equals(*ufid, data->ufid)) {
> +            del_ufid_tc_mapping_unlocked(ufid);
> +            break;
> +        }
> +    }
> +
>      hmap_insert(&ufid_to_tc, &new_data->ufid_to_tc_node, ufid_hash);
>      hmap_insert(&tc_to_ufid, &new_data->tc_to_ufid_node, tc_hash);
>      ovs_mutex_unlock(&ufid_lock);
>
> But it's still a little confused, because we have called
> get_ufid_tc_mapping() and del_filter_and_ufid_mapping() in flow put.
> Watching netlink message sounds a good idea as you said.
>

I agree doing this in adding the mapping sounds a bit confusing. We could still 
do something like the below which is probably more in line, and safe’s the 
extra lookup.

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 411a9a007..d256e427c 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1961,9 +1961,15 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
*match,
     }

     if (get_ufid_tc_mapping(ufid, &id) == 0) {
-        VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
-                    id.handle, id.prio);
-        info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, ufid);
+        if (id.ifindex != ifindex) {
+            VLOG_DBG_RL(&rl, "remove stale handle: %d ifindex: %d",
+                        id.handle, id.ifindex);
+            del_ufid_tc_mapping_unlocked(ufid);
+        } else {
+            VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
+                        id.handle, id.prio);
+            info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, 
ufid);
+        }
     }

Also finding a nice way to watch for netlink messages might be the real 
solution.

<SNIP>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to