On 6/4/2025 7:59 AM, Cédric Le Goater wrote:
On 6/4/25 09:09, Cédric Le Goater wrote:
On 6/2/25 14:36, Steven Sistare wrote:
On 6/1/2025 3:07 PM, Michael S. Tsirkin wrote:
On Sun, Jun 01, 2025 at 06:38:43PM +0200, Cédric Le Goater wrote:
On 5/29/25 21:24, Steve Sistare wrote:
Do not reset a vfio-pci device during CPR.
Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
---
include/hw/pci/pci_device.h | 3 +++
hw/pci/pci.c | 5 +++++
hw/vfio/pci.c | 7 +++++++
3 files changed, 15 insertions(+)
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index e41d95b..b481c5d 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -181,6 +181,9 @@ struct PCIDevice {
uint32_t max_bounce_buffer_size;
char *sriov_pf;
+
+ /* CPR */
+ bool skip_reset_on_cpr;
};
static inline int pci_intx(PCIDevice *pci_dev)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index f5ab510..21eb11c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -32,6 +32,7 @@
#include "hw/pci/pci_host.h"
#include "hw/qdev-properties.h"
#include "hw/qdev-properties-system.h"
+#include "migration/cpr.h"
#include "migration/qemu-file-types.h"
#include "migration/vmstate.h"
#include "net/net.h"
@@ -531,6 +532,10 @@ static void pci_reset_regions(PCIDevice *dev)
static void pci_do_device_reset(PCIDevice *dev)
{
+ if (dev->skip_reset_on_cpr && cpr_is_incoming()) {
+ return;
+ }
Since ->skip_reset_on_cpr is only true for vfio-pci devices, it could be
replaced by : object_dynamic_cast(OBJECT(dev), "vfio-pci")
Thanks,
C.
True but I don't really like driver dependent hacks.
what exactly about vfio makes it survive without this reset?
The kernel descriptors remain open and all the active kernel PCI state
remains in place. The device was never quiesced or de-configured in old QEMU.
The cast is fine with me; it depends on what Michael wants.
I don't see any good ways to avoid doing the reset when a cpr resume
is in progress. I agree the cast is pretty ugly. We could keep the
'skip_reset_on_cpr' attribute and make it a class attribute instead.
I don't see any advantage to making this a class attribute. I looked for
examples
of using such attributes for vfio to configure pci, and found very little. It
sounds like overkill since vfio already sets and gets PCIDevice members directly
in many places.
I defined skip_reset_on_cpr based on this existing example:
vfio_instance_init()
pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS
Also,
I wonder if the resettable interface, and more specifically the
RESET_TYPE_SNAPSHOT_LOAD type, might be useful. Have you explored
this alternative ?
RESET_TYPE_SNAPSHOT_LOAD (or a new type such as RESET_TYPE_CPR) would skip
reset for all devices, but we only skip for vfio_pci. All other devices
(including virtio) save and restore state using standard migration vmstate,
and must call reset.
- Steve