Am 12.04.24 um 11:32 schrieb Fabian Grünbichler: > On March 19, 2024 4:08 pm, Hannes Duerr wrote: >> When a snapshot is created with RAM, qemu attempts to save not only the >> RAM content, but also the internal state of the PCI devices. >> >> However, as not all drivers support this, this can lead to the device >> drivers in the VM not being able to handle the saved state during the >> restore/rollback and in conclusion the VM might crash. For this reason, >> we now generally prohibit snapshots with RAM for VMs with passthrough >> devices. >> >> In the future, this prohibition can of course be relaxed for individual >> drivers that we know support it, such as the vfio driver >>
We're already using pci-vfio, see [0]. So not sure how that relaxation would look like. Probably it'd need to be a flag for the hostpci property similar to what's done in Dominik's "implement experimental vgpu live migration" series for mapped devices. That said, looking into this and wondering why QEMU doesn't check it, there's an issue in that our savevm-async code does not properly check for all migration blockers (only some of them)! I'll work out a patch for that. If we can be sure not to break any existing users with the below code, we can still apply it too of course. >> Signed-off-by: Hannes Duerr <h.du...@proxmox.com> >> --- >> PVE/API2/Qemu.pm | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm >> index 40b6c30..0acd1c7 100644 >> --- a/PVE/API2/Qemu.pm >> +++ b/PVE/API2/Qemu.pm >> @@ -5101,6 +5101,16 @@ __PACKAGE__->register_method({ >> die "unable to use snapshot name 'pending' (reserved name)\n" >> if lc($snapname) eq 'pending'; >> >> + if ($param->{vmstate}) { >> + my $conf = PVE::QemuConfig->load_config($vmid); >> + >> + for my $key (keys %$conf) { >> + next if $key !~ /^hostpci\d+/; >> + die "cannot snapshot VM with RAM due to passed-through PCI >> device(s), which lack" >> + ." the possibility to save/restore their internal state\n"; >> + } >> + } > > isn't the same also true of other local resources (e.g., passed-through > USB?)? > > maybe we could find a way to unify the checks we do for live migration > (PVE::QemuServer::check_local_resources), since that is almost the same > code inside Qemu as a stateful snapshot+rollback? > > (not opposed to applying this before that happens though, just a > question in general..) > Similarly, there is the suspend API endpoint that could benefit from having a single helper. I assume this code was copied from there. [0]: https://git.proxmox.com/?p=qemu-server.git;a=blob;f=PVE/QemuServer/PCI.pm;h=1673041bbe7a5d638a0ee9c56ea6bbb31027023b;hb=HEAD#l625 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel