Re: [PATCH V5 20/38] migration: close kvm after cpr
On 7/3/2025 5:58 PM, Peter Xu wrote: On Thu, Jul 03, 2025 at 11:21:38PM +0200, Cédric Le Goater wrote: On 7/3/25 21:45, Peter Xu wrote: On Wed, Jul 02, 2025 at 03:41:08PM -0400, Steven Sistare wrote: The irq producer is not closed, but it is detached from the kvm consumer. It's eventfd is preserved in new QEMU, and interrupts that arrive during transition are pended there. Ah I see, looks reasonable. So can I understand the core issue here is about the irq consumer / provider updates are atomic, meanwhile there's always the fallback paths ready, so before / after the update the irq won't get lost? E.g. in Post-Interrupt context of Intel's, the irte will be updated atomically for these VFIO irqs, so that either it'll keep using the fast path (provided by the irqbypass mechanism), or slow path (eventfd_signal), so it's free of any kind of race that irq could trigger? I saw that there's already a new version and Cedric queued it. If possible add some explanation into commit message, either when repost, or when merge, would be nice, on explaning irq won't get lost. yes. Steve, just resend the patch. I will update the vfio queue. Or we can address that with a follow up patch before QEMU 10.1 is released. I've just noticed maybe I was wrong that slow path was always present. We've closed the kvm so likely the slow path is gone.. So I think I misunderstood, and Steve likely meant the irq will be persisted in eventfd, which is still true if the irq eventfds are persisted and passed over (I didn't check the patchset, but I'm assuming this is the case). Then I found, yes, indeed when irqfd is re-established on dest qemu, we have such tricky code: kvm_irqfd_assign(): /* * Check if there was an event already pending on the eventfd * before we registered, and trigger it as if we didn't miss it. */ events = vfs_poll(fd_file(f), &irqfd->pt); if (events & EPOLLIN) schedule_work(&irqfd->inject); I've no idea whether it was intended to do this as the code was there since 2009, maybe this chunk of code is the core of why irq won't get lost for CPR. But in all cases, it can be a pretty tricky spot to prove that cpr works and looks important piece of info. Yes, this is the mechanism I rely on to preserve an interrupt pended to the vfio eventfd. - Steve Personally I'm ok doing it on top of what's queued. Maybe such explanation on how it works should be put directly into docs/../cpr.rst? Thanks,
Re: [PATCH V5 20/38] migration: close kvm after cpr
On Thu, Jul 03, 2025 at 11:21:38PM +0200, Cédric Le Goater wrote: > On 7/3/25 21:45, Peter Xu wrote: > > On Wed, Jul 02, 2025 at 03:41:08PM -0400, Steven Sistare wrote: > > > The irq producer is not closed, but it is detached from the kvm consumer. > > > It's eventfd is preserved in new QEMU, and interrupts that arrive during > > > transition are pended there. > > > > Ah I see, looks reasonable. > > > > So can I understand the core issue here is about the irq consumer / > > provider updates are atomic, meanwhile there's always the fallback paths > > ready, so before / after the update the irq won't get lost? > > > > E.g. in Post-Interrupt context of Intel's, the irte will be updated > > atomically for these VFIO irqs, so that either it'll keep using the fast > > path (provided by the irqbypass mechanism), or slow path (eventfd_signal), > > so it's free of any kind of race that irq could trigger? > > > > I saw that there's already a new version and Cedric queued it. If possible > > add some explanation into commit message, either when repost, or when > > merge, would be nice, on explaning irq won't get lost. > yes. > > Steve, just resend the patch. I will update the vfio queue. > Or we can address that with a follow up patch before QEMU 10.1 > is released. I've just noticed maybe I was wrong that slow path was always present. We've closed the kvm so likely the slow path is gone.. So I think I misunderstood, and Steve likely meant the irq will be persisted in eventfd, which is still true if the irq eventfds are persisted and passed over (I didn't check the patchset, but I'm assuming this is the case). Then I found, yes, indeed when irqfd is re-established on dest qemu, we have such tricky code: kvm_irqfd_assign(): /* * Check if there was an event already pending on the eventfd * before we registered, and trigger it as if we didn't miss it. */ events = vfs_poll(fd_file(f), &irqfd->pt); if (events & EPOLLIN) schedule_work(&irqfd->inject); I've no idea whether it was intended to do this as the code was there since 2009, maybe this chunk of code is the core of why irq won't get lost for CPR. But in all cases, it can be a pretty tricky spot to prove that cpr works and looks important piece of info. Personally I'm ok doing it on top of what's queued. Maybe such explanation on how it works should be put directly into docs/../cpr.rst? Thanks, -- Peter Xu
Re: [PATCH V5 20/38] migration: close kvm after cpr
On 7/3/25 21:45, Peter Xu wrote: On Wed, Jul 02, 2025 at 03:41:08PM -0400, Steven Sistare wrote: The irq producer is not closed, but it is detached from the kvm consumer. It's eventfd is preserved in new QEMU, and interrupts that arrive during transition are pended there. Ah I see, looks reasonable. So can I understand the core issue here is about the irq consumer / provider updates are atomic, meanwhile there's always the fallback paths ready, so before / after the update the irq won't get lost? E.g. in Post-Interrupt context of Intel's, the irte will be updated atomically for these VFIO irqs, so that either it'll keep using the fast path (provided by the irqbypass mechanism), or slow path (eventfd_signal), so it's free of any kind of race that irq could trigger? I saw that there's already a new version and Cedric queued it. If possible add some explanation into commit message, either when repost, or when merge, would be nice, on explaning irq won't get lost. yes. Steve, just resend the patch. I will update the vfio queue. Or we can address that with a follow up patch before QEMU 10.1 is released. Thanks, C.
Re: [PATCH V5 20/38] migration: close kvm after cpr
On Wed, Jul 02, 2025 at 03:41:08PM -0400, Steven Sistare wrote: > The irq producer is not closed, but it is detached from the kvm consumer. > It's eventfd is preserved in new QEMU, and interrupts that arrive during > transition are pended there. Ah I see, looks reasonable. So can I understand the core issue here is about the irq consumer / provider updates are atomic, meanwhile there's always the fallback paths ready, so before / after the update the irq won't get lost? E.g. in Post-Interrupt context of Intel's, the irte will be updated atomically for these VFIO irqs, so that either it'll keep using the fast path (provided by the irqbypass mechanism), or slow path (eventfd_signal), so it's free of any kind of race that irq could trigger? I saw that there's already a new version and Cedric queued it. If possible add some explanation into commit message, either when repost, or when merge, would be nice, on explaning irq won't get lost. Thanks, -- Peter Xu
Re: [PATCH V5 20/38] migration: close kvm after cpr
On 7/2/2025 12:02 PM, Peter Xu wrote:
On Tue, Jul 01, 2025 at 11:25:23AM -0400, Steven Sistare wrote:
Hi Paolo, Peter, Fabiano,
This patch needs review. CPR for vfio is broken without it.
Soft feature freeze July 15.
Sorry to not have tried looking at this more even if this is marked
"migration".. obviously I still almost see it as a KVM change..
Questions inline below:
- Steve
On 6/10/2025 11:39 AM, Steve Sistare wrote:
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini
Signed-off-by: Steve Sistare
---
include/hw/vfio/vfio-device.h | 2 ++
include/migration/cpr.h | 2 ++
include/system/kvm.h | 1 +
accel/kvm/kvm-all.c | 32
accel/stubs/kvm-stub.c| 5 +
hw/vfio/helpers.c | 10 ++
hw/vfio/vfio-stubs.c | 13 +
migration/cpr-transfer.c | 18 ++
migration/cpr.c | 8
migration/migration.c | 1 +
hw/vfio/meson.build | 2 ++
11 files changed, 94 insertions(+)
create mode 100644 hw/vfio/vfio-stubs.c
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 4e4d0b6..6eb6f21 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -231,4 +231,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char
*str, Error **errp);
void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
DeviceState *dev, bool ram_discard);
int vfio_device_get_aw_bits(VFIODevice *vdev);
+
+void vfio_kvm_device_close(void);
#endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 07858e9..d09b657 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -32,7 +32,9 @@ void cpr_state_close(void);
struct QIOChannel *cpr_state_ioc(void);
bool cpr_incoming_needed(void *opaque);
+void cpr_kvm_close(void);
+void cpr_transfer_init(void);
QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 7cc60d2..4896a3c 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
int kvm_has_vcpu_events(void);
int kvm_max_nested_state_length(void);
int kvm_has_gsi_routing(void);
+void kvm_close(void);
/**
* kvm_arm_supports_user_irq
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a317783..3d3a557 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
goto err;
}
+/* If I am the CPU that created coalesced_mmio_ring, then discard it */
+if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+s->coalesced_mmio_ring = NULL;
+}
+
ret = munmap(cpu->kvm_run, mmap_size);
if (ret < 0) {
goto err;
}
+cpu->kvm_run = NULL;
if (cpu->kvm_dirty_gfns) {
ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
if (ret < 0) {
goto err;
}
+cpu->kvm_dirty_gfns = NULL;
}
kvm_park_vcpu(cpu);
@@ -608,6 +615,31 @@ err:
return ret;
}
+void kvm_close(void)
+{
+CPUState *cpu;
+
+if (!kvm_state || kvm_state->fd == -1) {
+return;
+}
+
+CPU_FOREACH(cpu) {
+cpu_remove_sync(cpu);
+close(cpu->kvm_fd);
+cpu->kvm_fd = -1;
+close(cpu->kvm_vcpu_stats_fd);
+cpu->kvm_vcpu_stats_fd = -1;
+}
+
+if (kvm_state && kvm_state->fd != -1) {
+close(kvm_state->vmfd);
+kvm_state->vmfd = -1;
+close(kvm_state->fd);
+kvm_state->fd = -1;
+}
+kvm_state = NULL;
+}
+
/*
* dirty pages logging control
*/
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index ecfd763..97dacb3 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -134,3 +134,8 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags,
Error **errp)
{
return -ENOSYS;
}
+
+void kvm_close(void)
+{
+return;
+}
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index d0dbab1..af1db2f 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -117,6 +117,16 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info
*info,
int vfio_kvm_device_fd = -1;
#endif
+void vfio_kvm_device_close(void)
+{
+#ifdef CONFIG_KVM
+if (vfio_kvm_device_fd != -1) {
+close(vfio_kvm_device_fd);
+vfio_kvm_device_fd = -1;
+}
+#endif
+}
+
int vfio_kvm_device_add_fd(int fd, Error **errp)
Re: [PATCH V5 20/38] migration: close kvm after cpr
On Tue, Jul 01, 2025 at 11:25:23AM -0400, Steven Sistare wrote:
> Hi Paolo, Peter, Fabiano,
>
> This patch needs review. CPR for vfio is broken without it.
> Soft feature freeze July 15.
Sorry to not have tried looking at this more even if this is marked
"migration".. obviously I still almost see it as a KVM change..
Questions inline below:
>
> - Steve
>
> On 6/10/2025 11:39 AM, Steve Sistare wrote:
> > cpr-transfer breaks vfio network connectivity to and from the guest, and
> > the host system log shows:
> >irq bypass consumer (token a03c32e5) registration fails: -16
> > which is EBUSY. This occurs because KVM descriptors are still open in
> > the old QEMU process. Close them.
> >
> > Cc: Paolo Bonzini
> > Signed-off-by: Steve Sistare
> > ---
> > include/hw/vfio/vfio-device.h | 2 ++
> > include/migration/cpr.h | 2 ++
> > include/system/kvm.h | 1 +
> > accel/kvm/kvm-all.c | 32
> > accel/stubs/kvm-stub.c| 5 +
> > hw/vfio/helpers.c | 10 ++
> > hw/vfio/vfio-stubs.c | 13 +
> > migration/cpr-transfer.c | 18 ++
> > migration/cpr.c | 8
> > migration/migration.c | 1 +
> > hw/vfio/meson.build | 2 ++
> > 11 files changed, 94 insertions(+)
> > create mode 100644 hw/vfio/vfio-stubs.c
> >
> > diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> > index 4e4d0b6..6eb6f21 100644
> > --- a/include/hw/vfio/vfio-device.h
> > +++ b/include/hw/vfio/vfio-device.h
> > @@ -231,4 +231,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const
> > char *str, Error **errp);
> > void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
> > DeviceState *dev, bool ram_discard);
> > int vfio_device_get_aw_bits(VFIODevice *vdev);
> > +
> > +void vfio_kvm_device_close(void);
> > #endif /* HW_VFIO_VFIO_COMMON_H */
> > diff --git a/include/migration/cpr.h b/include/migration/cpr.h
> > index 07858e9..d09b657 100644
> > --- a/include/migration/cpr.h
> > +++ b/include/migration/cpr.h
> > @@ -32,7 +32,9 @@ void cpr_state_close(void);
> > struct QIOChannel *cpr_state_ioc(void);
> > bool cpr_incoming_needed(void *opaque);
> > +void cpr_kvm_close(void);
> > +void cpr_transfer_init(void);
> > QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
> > QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
> > diff --git a/include/system/kvm.h b/include/system/kvm.h
> > index 7cc60d2..4896a3c 100644
> > --- a/include/system/kvm.h
> > +++ b/include/system/kvm.h
> > @@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
> > int kvm_has_vcpu_events(void);
> > int kvm_max_nested_state_length(void);
> > int kvm_has_gsi_routing(void);
> > +void kvm_close(void);
> > /**
> >* kvm_arm_supports_user_irq
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index a317783..3d3a557 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
> > goto err;
> > }
> > +/* If I am the CPU that created coalesced_mmio_ring, then discard it */
> > +if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
> > +s->coalesced_mmio_ring = NULL;
> > +}
> > +
> > ret = munmap(cpu->kvm_run, mmap_size);
> > if (ret < 0) {
> > goto err;
> > }
> > +cpu->kvm_run = NULL;
> > if (cpu->kvm_dirty_gfns) {
> > ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
> > if (ret < 0) {
> > goto err;
> > }
> > +cpu->kvm_dirty_gfns = NULL;
> > }
> > kvm_park_vcpu(cpu);
> > @@ -608,6 +615,31 @@ err:
> > return ret;
> > }
> > +void kvm_close(void)
> > +{
> > +CPUState *cpu;
> > +
> > +if (!kvm_state || kvm_state->fd == -1) {
> > +return;
> > +}
> > +
> > +CPU_FOREACH(cpu) {
> > +cpu_remove_sync(cpu);
> > +close(cpu->kvm_fd);
> > +cpu->kvm_fd = -1;
> > +close(cpu->kvm_vcpu_stats_fd);
> > +cpu->kvm_vcpu_stats_fd = -1;
> > +}
> > +
> > +if (kvm_state && kvm_state->fd != -1) {
> > +close(kvm_state->vmfd);
> > +kvm_state->vmfd = -1;
> > +close(kvm_state->fd);
> > +kvm_state->fd = -1;
> > +}
> > +kvm_state = NULL;
> > +}
> > +
> > /*
> >* dirty pages logging control
> >*/
> > diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> > index ecfd763..97dacb3 100644
> > --- a/accel/stubs/kvm-stub.c
> > +++ b/accel/stubs/kvm-stub.c
> > @@ -134,3 +134,8 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t
> > flags, Error **errp)
> > {
> > return -ENOSYS;
> > }
> > +
> > +void kvm_close(void)
> > +{
> > +return;
> > +}
> > diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> > inde
Re: [PATCH V5 20/38] migration: close kvm after cpr
Hi Paolo, Peter, Fabiano,
This patch needs review. CPR for vfio is broken without it.
Soft feature freeze July 15.
- Steve
On 6/10/2025 11:39 AM, Steve Sistare wrote:
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini
Signed-off-by: Steve Sistare
---
include/hw/vfio/vfio-device.h | 2 ++
include/migration/cpr.h | 2 ++
include/system/kvm.h | 1 +
accel/kvm/kvm-all.c | 32
accel/stubs/kvm-stub.c| 5 +
hw/vfio/helpers.c | 10 ++
hw/vfio/vfio-stubs.c | 13 +
migration/cpr-transfer.c | 18 ++
migration/cpr.c | 8
migration/migration.c | 1 +
hw/vfio/meson.build | 2 ++
11 files changed, 94 insertions(+)
create mode 100644 hw/vfio/vfio-stubs.c
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 4e4d0b6..6eb6f21 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -231,4 +231,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char
*str, Error **errp);
void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
DeviceState *dev, bool ram_discard);
int vfio_device_get_aw_bits(VFIODevice *vdev);
+
+void vfio_kvm_device_close(void);
#endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 07858e9..d09b657 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -32,7 +32,9 @@ void cpr_state_close(void);
struct QIOChannel *cpr_state_ioc(void);
bool cpr_incoming_needed(void *opaque);
+void cpr_kvm_close(void);
+void cpr_transfer_init(void);
QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 7cc60d2..4896a3c 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
int kvm_has_vcpu_events(void);
int kvm_max_nested_state_length(void);
int kvm_has_gsi_routing(void);
+void kvm_close(void);
/**
* kvm_arm_supports_user_irq
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a317783..3d3a557 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
goto err;
}
+/* If I am the CPU that created coalesced_mmio_ring, then discard it */
+if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+s->coalesced_mmio_ring = NULL;
+}
+
ret = munmap(cpu->kvm_run, mmap_size);
if (ret < 0) {
goto err;
}
+cpu->kvm_run = NULL;
if (cpu->kvm_dirty_gfns) {
ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
if (ret < 0) {
goto err;
}
+cpu->kvm_dirty_gfns = NULL;
}
kvm_park_vcpu(cpu);
@@ -608,6 +615,31 @@ err:
return ret;
}
+void kvm_close(void)
+{
+CPUState *cpu;
+
+if (!kvm_state || kvm_state->fd == -1) {
+return;
+}
+
+CPU_FOREACH(cpu) {
+cpu_remove_sync(cpu);
+close(cpu->kvm_fd);
+cpu->kvm_fd = -1;
+close(cpu->kvm_vcpu_stats_fd);
+cpu->kvm_vcpu_stats_fd = -1;
+}
+
+if (kvm_state && kvm_state->fd != -1) {
+close(kvm_state->vmfd);
+kvm_state->vmfd = -1;
+close(kvm_state->fd);
+kvm_state->fd = -1;
+}
+kvm_state = NULL;
+}
+
/*
* dirty pages logging control
*/
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index ecfd763..97dacb3 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -134,3 +134,8 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags,
Error **errp)
{
return -ENOSYS;
}
+
+void kvm_close(void)
+{
+return;
+}
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index d0dbab1..af1db2f 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -117,6 +117,16 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info
*info,
int vfio_kvm_device_fd = -1;
#endif
+void vfio_kvm_device_close(void)
+{
+#ifdef CONFIG_KVM
+if (vfio_kvm_device_fd != -1) {
+close(vfio_kvm_device_fd);
+vfio_kvm_device_fd = -1;
+}
+#endif
+}
+
int vfio_kvm_device_add_fd(int fd, Error **errp)
{
#ifdef CONFIG_KVM
diff --git a/hw/vfio/vfio-stubs.c b/hw/vfio/vfio-stubs.c
new file mode 100644
index 000..a4c8b56
--- /dev/null
+++ b/hw/vfio/vfio-stubs.c
@@ -0,0 +1,13 @@
+/*
+ * Copyright (c) 2025 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-o
Re: [PATCH V5 20/38] migration: close kvm after cpr
Steve Sistare writes: > cpr-transfer breaks vfio network connectivity to and from the guest, and > the host system log shows: > irq bypass consumer (token a03c32e5) registration fails: -16 > which is EBUSY. This occurs because KVM descriptors are still open in > the old QEMU process. Close them. > > Cc: Paolo Bonzini > Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas
[PATCH V5 20/38] migration: close kvm after cpr
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini
Signed-off-by: Steve Sistare
---
include/hw/vfio/vfio-device.h | 2 ++
include/migration/cpr.h | 2 ++
include/system/kvm.h | 1 +
accel/kvm/kvm-all.c | 32
accel/stubs/kvm-stub.c| 5 +
hw/vfio/helpers.c | 10 ++
hw/vfio/vfio-stubs.c | 13 +
migration/cpr-transfer.c | 18 ++
migration/cpr.c | 8
migration/migration.c | 1 +
hw/vfio/meson.build | 2 ++
11 files changed, 94 insertions(+)
create mode 100644 hw/vfio/vfio-stubs.c
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 4e4d0b6..6eb6f21 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -231,4 +231,6 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char
*str, Error **errp);
void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
DeviceState *dev, bool ram_discard);
int vfio_device_get_aw_bits(VFIODevice *vdev);
+
+void vfio_kvm_device_close(void);
#endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 07858e9..d09b657 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -32,7 +32,9 @@ void cpr_state_close(void);
struct QIOChannel *cpr_state_ioc(void);
bool cpr_incoming_needed(void *opaque);
+void cpr_kvm_close(void);
+void cpr_transfer_init(void);
QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 7cc60d2..4896a3c 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -195,6 +195,7 @@ bool kvm_has_sync_mmu(void);
int kvm_has_vcpu_events(void);
int kvm_max_nested_state_length(void);
int kvm_has_gsi_routing(void);
+void kvm_close(void);
/**
* kvm_arm_supports_user_irq
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a317783..3d3a557 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -515,16 +515,23 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
goto err;
}
+/* If I am the CPU that created coalesced_mmio_ring, then discard it */
+if (s->coalesced_mmio_ring == (void *)cpu->kvm_run + PAGE_SIZE) {
+s->coalesced_mmio_ring = NULL;
+}
+
ret = munmap(cpu->kvm_run, mmap_size);
if (ret < 0) {
goto err;
}
+cpu->kvm_run = NULL;
if (cpu->kvm_dirty_gfns) {
ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_bytes);
if (ret < 0) {
goto err;
}
+cpu->kvm_dirty_gfns = NULL;
}
kvm_park_vcpu(cpu);
@@ -608,6 +615,31 @@ err:
return ret;
}
+void kvm_close(void)
+{
+CPUState *cpu;
+
+if (!kvm_state || kvm_state->fd == -1) {
+return;
+}
+
+CPU_FOREACH(cpu) {
+cpu_remove_sync(cpu);
+close(cpu->kvm_fd);
+cpu->kvm_fd = -1;
+close(cpu->kvm_vcpu_stats_fd);
+cpu->kvm_vcpu_stats_fd = -1;
+}
+
+if (kvm_state && kvm_state->fd != -1) {
+close(kvm_state->vmfd);
+kvm_state->vmfd = -1;
+close(kvm_state->fd);
+kvm_state->fd = -1;
+}
+kvm_state = NULL;
+}
+
/*
* dirty pages logging control
*/
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index ecfd763..97dacb3 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -134,3 +134,8 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags,
Error **errp)
{
return -ENOSYS;
}
+
+void kvm_close(void)
+{
+return;
+}
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index d0dbab1..af1db2f 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -117,6 +117,16 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info
*info,
int vfio_kvm_device_fd = -1;
#endif
+void vfio_kvm_device_close(void)
+{
+#ifdef CONFIG_KVM
+if (vfio_kvm_device_fd != -1) {
+close(vfio_kvm_device_fd);
+vfio_kvm_device_fd = -1;
+}
+#endif
+}
+
int vfio_kvm_device_add_fd(int fd, Error **errp)
{
#ifdef CONFIG_KVM
diff --git a/hw/vfio/vfio-stubs.c b/hw/vfio/vfio-stubs.c
new file mode 100644
index 000..a4c8b56
--- /dev/null
+++ b/hw/vfio/vfio-stubs.c
@@ -0,0 +1,13 @@
+/*
+ * Copyright (c) 2025 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vfio/vfio-device.h"
+
+void vfio_kvm_device_close(void)
+{
+return;
+}
diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c
index e1f1403..396558f 100644
--- a/migration/cpr-
