On 29 Mar 2022, at 10:37, Tao Liu wrote:
> On Mon, Mar 28, 2022 at 04:08:29PM +0200, Eelco Chaudron wrote: >> On Mon, Mar 28, 2022 at 3:15 PM Eelco Chaudron <[email protected]> wrote: >> >>> >>> >>> On 25 Mar 2022, at 18:05, Tao Liu wrote: >>> >>>> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote: >>>>> >>>>> >>>>> On 22 Mar 2022, at 13:04, Tao Liu wrote: >>>>> >>>>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all >>> tc >>>>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc >>>>>> hashmap. >>>>>> When qemu restart with a same vnet but different ifindex assigned, >>>>>> a dup ufid may add. Especially after hashmap_expand, the old entry will >>>>>> insert before the new one, delete or modify tc flow will always fail. >>>>>> >>>>>> So delete the stale entry to avoid a duplicated ufid in hashmap. >>>>> >>>>> There are some comments below, as I still do not fully understand the >>> fix. >>>>> >>>> Please let me explain more. >>>> As the scenery of qemu restart described above, filters are flushed by >>>> kernel, but tcf_ids remain in ufid_to_tc hashmap. >>>> >>>> Considering a ping running continuously from other host to VM, the first >>>> reply from VM triggers upcall after qemu restart. A same ufid is >>> generated >>>> if the request processed on same handler. At this time, an old ufid >>> exists >>>> in hashmap, and flow del returns ENODEV(ifindex is wrong). >>> >>> Can this not be solved by modifying the previous fix to this? >>> >>> --- a/lib/netdev-offload-tc.c >>> +++ b/lib/netdev-offload-tc.c >>> @@ -198,7 +198,9 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const >>> ovs_u128 *ufid) >>> int err; >>> >>> err = tc_del_filter(id); >>> - if (!err) { >>> + if (!err || err = ENODEV) { >>> del_ufid_tc_mapping(ufid); >>> } >>> return err; >>> } >>> >>> >>>> However flow put >>>> still succeeds because there is no filter in kernel. And a dup ufid adds >>> in >>>> hashmap. >>>> >>>> After hashmap_expand, the old entry inserts before the new one. >>>> >>>> If flow expires, revalidator fails to del flow bacause we always get the >>>> old entry. There are constant warnings in ovs-vswitch.log: >>>> >>>> dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No >>> such >>>> file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ... >>>> >>>> In case of flow mod, flow del inside flow put still fails with ENODEV, >>> and >>>> tc_replace_flower also fails because old filter exists in kernel. >>>> >>>> The idea of this patch is that, if a filter can't get from kernel, we can >>>> just delete the mapping, and no need to call del filter as it does not >>>> exist. In this way, we can keep consistent between kernel and userspace. >>> >>> Thanks for the details, and I think the problem is a bit more complex. >>> >>> As in theory duplicate ufid's could exist (guess even on the same port ;). >>> We could also fix this by including the interface in the lookup, in >>> get_ufid_tc_mapping()? >>> >> >> Not sure how OVS deals with the uniqueness of the ufid, if they assume its >> unique within the system, the fix can be simplified to something like this: >> >> --- a/lib/netdev-offload-tc.c >> +++ b/lib/netdev-offload-tc.c >> >> @@ -218,8 +222,21 @@ add_ufid_tc_mapping(struct netdev *netdev, const >> ovs_u128 *ufid, >> { >> struct ufid_tc_data *new_data = xzalloc(sizeof *new_data); >> size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0); >> + struct tcf_id existing_id; >> size_t tc_hash; >> >> + /* ADD COMMENT WHY ITS SAFE TO DELETE THIS. */ >> + if (!get_ufid_tc_mapping(ufid, &existing_id)) { >> + del_ufid_tc_mapping(ufid); >> + } >> + >> tc_hash = hash_int(hash_int(id->prio, id->handle), id->ifindex); >> tc_hash = hash_int(id->chain, tc_hash); >> >> > This fix works. > And I make a minor modification to avoid multiple hash computation. > > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -216,6 +216,7 @@ static void > add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid, > struct tcf_id *id) > { > + struct ufid_tc_data *data; > struct ufid_tc_data *new_data = xzalloc(sizeof *new_data); > size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0); > size_t tc_hash; > @@ -228,6 +229,16 @@ add_ufid_tc_mapping(struct netdev *netdev, const > ovs_u128 *ufid, > new_data->netdev = netdev_ref(netdev); > > ovs_mutex_lock(&ufid_lock); > + /* Remove the stale ufid mapping to avoid a dup entry. > + * ufid is unique in ovs which guaranteed by ukey_install__(), so it's > + * safe to delete here. */ > + HMAP_FOR_EACH_WITH_HASH (data, ufid_to_tc_node, ufid_hash, &ufid_to_tc) { > + if (ovs_u128_equals(*ufid, data->ufid)) { > + del_ufid_tc_mapping_unlocked(ufid); > + break; > + } > + } > + > hmap_insert(&ufid_to_tc, &new_data->ufid_to_tc_node, ufid_hash); > hmap_insert(&tc_to_ufid, &new_data->tc_to_ufid_node, tc_hash); > ovs_mutex_unlock(&ufid_lock); > > But it's still a little confused, because we have called > get_ufid_tc_mapping() and del_filter_and_ufid_mapping() in flow put. > Watching netlink message sounds a good idea as you said. > I agree doing this in adding the mapping sounds a bit confusing. We could still do something like the below which is probably more in line, and safe’s the extra lookup. diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 411a9a007..d256e427c 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -1961,9 +1961,15 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match, } if (get_ufid_tc_mapping(ufid, &id) == 0) { - VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d", - id.handle, id.prio); - info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, ufid); + if (id.ifindex != ifindex) { + VLOG_DBG_RL(&rl, "remove stale handle: %d ifindex: %d", + id.handle, id.ifindex); + del_ufid_tc_mapping_unlocked(ufid); + } else { + VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d", + id.handle, id.prio); + info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, ufid); + } } Also finding a nice way to watch for netlink messages might be the real solution. <SNIP> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
