On Tue, Mar 29, 2022 at 11:25:38AM +0200, Eelco Chaudron wrote: > > > On 29 Mar 2022, at 10:37, Tao Liu wrote: > > > On Mon, Mar 28, 2022 at 04:08:29PM +0200, Eelco Chaudron wrote: > >> On Mon, Mar 28, 2022 at 3:15 PM Eelco Chaudron <[email protected]> wrote: > >> > >>> > >>> > >>> On 25 Mar 2022, at 18:05, Tao Liu wrote: > >>> > >>>> On Fri, Mar 25, 2022 at 12:08:20PM +0100, Eelco Chaudron wrote: > >>>>> > >>>>> > >>>>> On 22 Mar 2022, at 13:04, Tao Liu wrote: > >>>>> > >>>>>> If netdev goes away(i.e. qemu with a vnet shutdown), kernel delete all > >>> tc > >>>>>> filters, those tcf_id related to the netdev will be left in ufid_to_tc > >>>>>> hashmap. > >>>>>> When qemu restart with a same vnet but different ifindex assigned, > >>>>>> a dup ufid may add. Especially after hashmap_expand, the old entry will > >>>>>> insert before the new one, delete or modify tc flow will always fail. > >>>>>> > >>>>>> So delete the stale entry to avoid a duplicated ufid in hashmap. > >>>>> > >>>>> There are some comments below, as I still do not fully understand the > >>> fix. > >>>>> > >>>> Please let me explain more. > >>>> As the scenery of qemu restart described above, filters are flushed by > >>>> kernel, but tcf_ids remain in ufid_to_tc hashmap. > >>>> > >>>> Considering a ping running continuously from other host to VM, the first > >>>> reply from VM triggers upcall after qemu restart. A same ufid is > >>> generated > >>>> if the request processed on same handler. At this time, an old ufid > >>> exists > >>>> in hashmap, and flow del returns ENODEV(ifindex is wrong). > >>> > >>> Can this not be solved by modifying the previous fix to this? > >>> > >>> --- a/lib/netdev-offload-tc.c > >>> +++ b/lib/netdev-offload-tc.c > >>> @@ -198,7 +198,9 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const > >>> ovs_u128 *ufid) > >>> int err; > >>> > >>> err = tc_del_filter(id); > >>> - if (!err) { > >>> + if (!err || err = ENODEV) { > >>> del_ufid_tc_mapping(ufid); > >>> } > >>> return err; > >>> } > >>> > >>> > >>>> However flow put > >>>> still succeeds because there is no filter in kernel. And a dup ufid adds > >>> in > >>>> hashmap. > >>>> > >>>> After hashmap_expand, the old entry inserts before the new one. > >>>> > >>>> If flow expires, revalidator fails to del flow bacause we always get the > >>>> old entry. There are constant warnings in ovs-vswitch.log: > >>>> > >>>> dpif(revalidator49)|WARN|system@ovs-system: failed to flow_del (No > >>> such > >>>> file or directory) ufid:d81939b3-8c8d-4b35-8d58-88c098709b91 ... > >>>> > >>>> In case of flow mod, flow del inside flow put still fails with ENODEV, > >>> and > >>>> tc_replace_flower also fails because old filter exists in kernel. > >>>> > >>>> The idea of this patch is that, if a filter can't get from kernel, we can > >>>> just delete the mapping, and no need to call del filter as it does not > >>>> exist. In this way, we can keep consistent between kernel and userspace. > >>> > >>> Thanks for the details, and I think the problem is a bit more complex. > >>> > >>> As in theory duplicate ufid's could exist (guess even on the same port ;). > >>> We could also fix this by including the interface in the lookup, in > >>> get_ufid_tc_mapping()? > >>> > >> > >> Not sure how OVS deals with the uniqueness of the ufid, if they assume its > >> unique within the system, the fix can be simplified to something like this: > >> > >> --- a/lib/netdev-offload-tc.c > >> +++ b/lib/netdev-offload-tc.c > >> > >> @@ -218,8 +222,21 @@ add_ufid_tc_mapping(struct netdev *netdev, const > >> ovs_u128 *ufid, > >> { > >> struct ufid_tc_data *new_data = xzalloc(sizeof *new_data); > >> size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0); > >> + struct tcf_id existing_id; > >> size_t tc_hash; > >> > >> + /* ADD COMMENT WHY ITS SAFE TO DELETE THIS. */ > >> + if (!get_ufid_tc_mapping(ufid, &existing_id)) { > >> + del_ufid_tc_mapping(ufid); > >> + } > >> + > >> tc_hash = hash_int(hash_int(id->prio, id->handle), id->ifindex); > >> tc_hash = hash_int(id->chain, tc_hash); > >> > >> > > This fix works. > > And I make a minor modification to avoid multiple hash computation. > > > > --- a/lib/netdev-offload-tc.c > > +++ b/lib/netdev-offload-tc.c > > @@ -216,6 +216,7 @@ static void > > add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid, > > struct tcf_id *id) > > { > > + struct ufid_tc_data *data; > > struct ufid_tc_data *new_data = xzalloc(sizeof *new_data); > > size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0); > > size_t tc_hash; > > @@ -228,6 +229,16 @@ add_ufid_tc_mapping(struct netdev *netdev, const > > ovs_u128 *ufid, > > new_data->netdev = netdev_ref(netdev); > > > > ovs_mutex_lock(&ufid_lock); > > + /* Remove the stale ufid mapping to avoid a dup entry. > > + * ufid is unique in ovs which guaranteed by ukey_install__(), so it's > > + * safe to delete here. */ > > + HMAP_FOR_EACH_WITH_HASH (data, ufid_to_tc_node, ufid_hash, > > &ufid_to_tc) { > > + if (ovs_u128_equals(*ufid, data->ufid)) { > > + del_ufid_tc_mapping_unlocked(ufid); > > + break; > > + } > > + } > > + > > hmap_insert(&ufid_to_tc, &new_data->ufid_to_tc_node, ufid_hash); > > hmap_insert(&tc_to_ufid, &new_data->tc_to_ufid_node, tc_hash); > > ovs_mutex_unlock(&ufid_lock); > > > > But it's still a little confused, because we have called > > get_ufid_tc_mapping() and del_filter_and_ufid_mapping() in flow put. > > Watching netlink message sounds a good idea as you said. > > > > I agree doing this in adding the mapping sounds a bit confusing. We could > still do something like the below which is probably more in line, and safe’s > the extra lookup. > > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > index 411a9a007..d256e427c 100644 > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -1961,9 +1961,15 @@ netdev_tc_flow_put(struct netdev *netdev, struct match > *match, > } > > if (get_ufid_tc_mapping(ufid, &id) == 0) { > - VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d", > - id.handle, id.prio); > - info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, > ufid); > + if (id.ifindex != ifindex) { > + VLOG_DBG_RL(&rl, "remove stale handle: %d ifindex: %d", > + id.handle, id.ifindex); > + del_ufid_tc_mapping_unlocked(ufid); > + } else { > + VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d", > + id.handle, id.prio); > + info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, > ufid); > + } > } Agree with you. This is my first version.
> > Also finding a nice way to watch for netlink messages might be the real > solution. > Agree. > <SNIP> > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
