On 3/13/26 5:15 PM, Matteo Perin wrote: > Thank you Ilya and Adrián for your comments and suggestions! > Sorry for the late reply, but I wanted to dig deeper about this issue. > > I think I found the root cause and it is actually not entirely fault of the > NETLINK_LISTEN_ALL_NSID flag only, here. > > On Fri, 13 Mar 2026 at 09:33, Adrián Moreno <[email protected] > <mailto:[email protected]>> wrote: > > On Thu, Mar 05, 2026 at 06:27:00PM +0100, Matteo Perin via dev wrote: > > For ports on non-system (i.e. userspace) datapaths, the > dpif_netlink_vport_get() > > call in netdev_linux_netnsid_update() is meaningless, these ports are > not > > kernel vports. > > > > Generalize the tap class check in netdev_linux_netnsid_update() with a > > dpif_type check: when dpif_type is set and is not "system", assume the > > device is local without attempting the vport lookup. This change will > > cover all device types on userspace datapaths (e.g. veth pairs). > > > > Additionally, bypass the nsid equality check in netdev_linux_update() > > for non-system datapaths. When NETLINK_LISTEN_ALL_NSID is enabled, > > local RTM events carry the kernel-assigned namespace ID rather than > > NETNSID_LOCAL, causing a mismatch with the locally-assumed nsid. For > > non-system datapaths, process all RTM events unconditionally (the > > interface name lookup already ensures only OVS-managed devices are > > affected). > > IIUC, local events come without nsid in the socket's auxiliary data and > nl_sock_recv__ should ensure in that case NETNSID_LOCAL is returned. Can > you give more details of your usecase and how to see a local netdev > event with something different to NETNSID_LOCAL? > > > Unfortunately, there is an instance where local events can come with a nsid > and that > is when there is a self-referential nsid mapping in the namespace peer ID > table > (i.e. the root namespace has an entry that maps to itself). > > This can be a common occurrence since container runtimes (this is true for > LXD, > for example, afaik) maintain nsid mappings so they can efficiently query > network > interface information across namespaces (e.g. retrieving container interface > stats > from the host without entering each container namespace). > > As a side-effect of these cross-namespace link queries, the kernel allocates > an nsid entry in the host namespace table that maps back to itself. This > mapping > is harmless under normal operation, it is simply an artifact of how the > kernel tracks > namespace relationships and it persists for the lifetime of the system. > > When OVS enables NETLINK_LISTEN_ALL_NSID on its RTNL socket, the > kernel decides whether to attach an nsid cmsg to each broadcast by > looking up the sender's namespace in the receiver's nsid table. > > Normally the root namespace has no nsid entry for itself, so local events > carry > no cmsg, and OVS correctly records them as NETNSID_LOCAL (−1). > > But, given the precondition above, the kernel finds it when delivering any > local RTM broadcast: the lookup returns the self nsid instead of "not > assigned", so > the kernel attaches a cmsg with numerical nsid to local events as well.
Hrm, OK. Thanks for digging into this! > > On Fri, 13 Mar 2026 at 14:28, Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: > > Not sure why the list was not included in my previous reply, adding it > back. > > On 3/13/26 9:49 AM, Adrián Moreno wrote: > > On Wed, Mar 11, 2026 at 01:54:13PM +0100, Ilya Maximets wrote: > >> On 3/5/26 6:27 PM, Matteo Perin via dev wrote: > >>> For ports on non-system (i.e. userspace) datapaths, the > dpif_netlink_vport_get() > >>> call in netdev_linux_netnsid_update() is meaningless, these ports are > not > >>> kernel vports. > >>> > >>> Generalize the tap class check in netdev_linux_netnsid_update() with a > >>> dpif_type check: when dpif_type is set and is not "system", assume the > >>> device is local without attempting the vport lookup. This change will > >>> cover all device types on userspace datapaths (e.g. veth pairs). > >>> > >>> Additionally, bypass the nsid equality check in netdev_linux_update() > >>> for non-system datapaths. When NETLINK_LISTEN_ALL_NSID is enabled, > >>> local RTM events carry the kernel-assigned namespace ID rather than > >>> NETNSID_LOCAL, causing a mismatch with the locally-assumed nsid. For > >>> non-system datapaths, process all RTM events unconditionally (the > >>> interface name lookup already ensures only OVS-managed devices are > >>> affected). > >> > >> Hmm. I don't think this is right. The name is not unique across > namespaces, > >> it can be a completely different device in a different namespace. We > can't > >> rely on just a name. > > > > I think you're right. This can be problematic. > > > >> > >> This all-nsids listening functionality is as annoying as it is > useless... :) > >> > > > > Should we go ahead with the attempts to deprecate it? > > Let's see what comes from your question that nsid should not be present > in the > local notifications. But if it is present, then I don't think there is an > actual way for us to know what's local and what isn't, unless we check > the actual > ID of the datapath interface and compare to that. But it sounds like > more and > more hacks for questionably useful functionality. So, in this case it > might be > better to just deprecate it. > > > Maybe we could also try to fetch (with something like a query to RTM_GETNSID > request with NETNSA_FD pointing to /proc/self/ns/net) and cache the static > self-nsid > and treat that as NETNSID_LOCAL too? > > I do not think that it will be a very clean workaround but it could be a > possibility. We can try this as a temporary workaround for now, so we don't block this series. Just need to make sure we're not making such extra requests all the time. But in the grand scheme of things it may still be worth starting the deprecation process, as this is a mess. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
