On 8/21/25 11:13 AM, David Woodhouse wrote:
On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
Makes the net_hub_port_cleanup function idempotent to avoid double
removals by guarding its QLIST_REMOVE with a flag.
When using a Xen networking device with hubport backends, e.g.:
-accel kvm,xen-version=0x40011
-netdev hubport,...
-device xen-net-device,...
the shutdown order starts with net_cleanup, which walks the list and
deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
called, which eventually leads to a second net_hub_port_cleanup call,
resulting in a segfault.
Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Tested-by: David Woodhouse <d...@amazon.co.uk>
But I hate it.
The lifetime of these objects is confusing, and this patch doesn't make
it nicer.
Why is it OK for the object to be taken off the list while it still
exists and is findable by other pointers? What does it *mean* for it to
be in that state? Doesn't it have a refcount? Can't it be unlisted
*and* freed only when that refcount goes to zero?
I believe this "off-list but still exists" state is intentional and is
the model for NIC peer backends. IIUC, it essentially means "detached
from hub broadcast but owned by the NIC until device teardown".
The net core explicitly transfers ownership of a backend from the global
list to the NIC without freeing it, so it can be torn down later by the
device itself. See the comments in qemu_del_net_client and net_cleanup
in net/net.c.
In other words, a backend can be removed from net_clients and still
exist (owned by the NIC) until qemu_del_nic cleans it up and frees it.
There's no refcount today and lifetime is manual.
The real bug here is that hubport's cleanup is not idempotent, not
necessarily the ownership model itself.
I do agree though that the lifetime of these objects is confusing. And a
refcount model would be theoretically cleaner, but current code
explicitly relies on ownership transfer without refcounts.
---
Actually, now that I think about it, perhaps we don't need this extra
'listed' state and instead can just check 'if (port->hub)' and set it to
NULL after it's removed from the list.