As reported in the community forum [0], after migration, the VM might not immediately be able to respond to QMP commands, which means the VM could fail to resume and stay in paused state on the target.
The reason seems to be that activating the block drives in QEMU can take a bit of time. For example, it might be necessary to invalidate the caches (where for raw devices a flush might be needed) and the size of the block device needs to be queried. In [0], an external Ceph cluster is used, but there doesn't seem to be a flush. It also shows that the required timeout is a bit over 10 seconds, so use 60 to be on the safe side for the future. All callers are inside workers or via the 'qm' CLI command, so bumping beyond 30 seconds is fine. [0]: https://forum.proxmox.com/threads/149610/ Signed-off-by: Fiona Ebner <f.eb...@proxmox.com> --- PVE/QemuServer.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index bf59b091..9e840912 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -6461,7 +6461,9 @@ sub vm_resume { my ($vmid, $skiplock, $nocheck) = @_; PVE::QemuConfig->lock_config($vmid, sub { - my $res = mon_cmd($vmid, 'query-status'); + # After migration, the VM might not immediately be able to respond to QMP commands, because + # activating the block devices might take a bit of time. + my $res = mon_cmd($vmid, 'query-status', timeout => 60); my $resume_cmd = 'cont'; my $reset = 0; my $conf; -- 2.39.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel