On 8/18/2025 11:04 AM, Chaney, Ben wrote:
  steven.sistare@oracle.comFrom: Steve Sistare <steven.sist...@oracle.com 
<mailto:steven.sist...@oracle.com>

Tap and vhost devices can be preserved during cpr-transfer using
traditional live migration methods, wherein the management layer
creates new interfaces for the target and fiddles with 'ip link'
to deactivate the old interface and activate the new.

However, CPR can simply send the file descriptors to new QEMU,
with no special management actions required. The user enables
this behavior by specifing '-netdev tap,cpr=on'. The default
is cpr=off.


Hi Steve,

Thank you for sending this patch set I tried testing it, and
the migration fails with the following error on the destination:


2025-08-07T18:14:30.564323Z qemu-system-x86_64: could not disable queue
qemu-system-x86_64: ../hw/net/virtio-net.c:767: virtio_net_set_queue_pairs: 
Assertion `!r' failed.


And the following error on the source:

vhost_reset_device failed: Operation not permitted (1)
vhost_reset_device failed: Operation not permitted (1)
2025-08-15T14:50:16.028494Z qemu-system-x86_64: Failed to connect to 
'main.sock': Connection refused
2025-08-15T14:50:16.028552Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028565Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028578Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028590Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028604Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028629Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028641Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028844Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028856Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028868Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028880Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028893Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028904Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028916Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)
2025-08-15T14:50:16.028928Z qemu-system-x86_64: vhost_set_owner failed: Device 
or resource busy (16)

I suspect the issue may be related to the fact that we are dropping
privileges (-run-with user=$USERNAME) as cpr transfer has run
into other issues with that in the past, but I haven't found anything
concrete there yet.

Some other information:

The full qemu arguments used for networking are:

-netdev 
tap,id=net0,ifname=tap.79874411_0,script=no,downscript=no,vhost=on,queues=8,cpr=on
-device virtio-net-pci,netdev=net0,id=netpci0,mac=$mac1,vectors=18,mq=on
-netdev 
tap,id=net1,ifname=tap.79874411_1,script=no,downscript=no,vhost=on,queues=8,cpr=on
-device virtio-net-pci,netdev=net1,id=netpci1,mac=$mac2,vectors=18,mq=on

I applied your patch on top of 7136352b40631b058dd0fe731a0d404e761e799f
I also applied the pending arm interrupt fix

Thanks very much Ben!  CPR for vhost is completely broken.  I did not notice
in my testing because the "falling back on userspace virtio" code maintains
connectivity to the guest.

I will reply to "[RFC V2 5/8] Revert "vhost-backend: remove 
vhost_kernel_reset_device()"
and post a replacement patch which fixes the set/reset owner failures, then the 
remaining
calls should work.

To use run-with, you will also need the patch that you posted for setting
the owner in unix_listen_saddr, before change_process_uid is called:
  
https://lore.kernel.org/qemu-devel/3d32b62f-29e2-4470-86a5-9a2b3b29e...@akamai.com/

- Steve


Reply via email to