I'm using libvirt/qemu under Debian 12 and trying to set up live
migration of a vm with an SV-IOV pass-through ethernet interface
(libvirt 9.0.0 from stable and qemu 9.2.0 from backports).
Live migration works successfully, but there is a long delay at the end
where the vm is frozen and unresponsive. This only happens when there is
an SV-IOV nic, if the only nic is virtio, there is no perceptible
unavailability of the same vm during a live migration.
Reviewing the libvirt logs, on the sending side there is a qemu event
received:
2025-05-03 02:40:28.540+0000: 265296: info :
qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_EVENT:
mon=0x7fef8c0add00 event={"timestamp": {"seconds": 1746240028, "micr
oseconds": 540071}, "event": "MIGRATION_PASS", "data": {"pass": 4}}
then a query status:
2025-05-03 02:40:29.361+0000: 265296: info : qemuMonitorIOWrite:366 :
QEMU_MONITOR_IO_WRITE: mon=0x7fef8c0add00
buf={"execute":"query-migrate","id":"libvirt-632"}
len=48 ret=48 errno=0
then a delay of almost a minute before the response is received:
2025-05-03 02:41:21.518+0000: 265296: debug :
qemuMonitorJSONIOProcessLine:189 : Line [{"return":
{"expected-downtime": 368, "vfio": {"transferred": 0}, "status":
"device", "setup-time": 295, "total-time": 88133, "ram": {"total":
137452265472, "postcopy-requests": 0, "dirty-sync-count": 4,
"multifd-bytes": 2828465664, "pages-per-second": 249661,
"downtime-bytes": 13190, "page-size": 4096, "remaining": 0,
"postcopy-bytes": 0, "mbps": 8203.3794380165291, "transferred":
3128949385, "dirty-sync-missed-zero-copy": 0, "precopy-bytes":
295878716, "duplicate": 32876281, "dirty-pages-rate": 112,
"normal-bytes": 2807078912, "normal": 685322}}, "id": "libvirt-632"}]
2025-05-03 02:41:21.518+0000: 265296: info :
qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fef8c0add00 reply={"return": {"expected-downtime": 368, "vfio":
{"transferred": 0}, "status": "device", "setup-time": 295, "total-time":
88133, "ram": {"total": 137452265472, "postcopy-requests": 0,
"dirty-sync-count": 4, "multifd-bytes": 2828465664, "pages-per-second":
249661, "downtime-bytes": 13190, "page-size": 4096, "remaining": 0,
"postcopy-bytes": 0, "mbps": 8203.3794380165291, "transferred":
3128949385, "dirty-sync-missed-zero-copy": 0, "precopy-bytes":
295878716, "duplicate": 32876281, "dirty-pages-rate": 112,
"normal-bytes": 2807078912, "normal": 685322}}, "id": "
On the receiving side, there is an event regarding the SR-IOV nic:
2025-05-03 02:40:28.783+0000: 333347: info :
qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_EVENT:
mon=0x7fb6bc0ac2f0 event={"timestamp": {"seconds": 1746240028,
"microseconds": 783269}, "event": "FAILOVER_NEGOTIATED", "data":
{"device-id": "ua-sr-iov-backup"}}
and then, again almost a minute later, a completion notification:
2025-05-03 02:41:21.534+0000: 333347: debug :
qemuMonitorJSONIOProcessLine:189 : Line [{"timestamp": {"seconds":
1746240081, "microseconds": 534094}, "event": "MIGRATION", "data":
{"status": "completed"}}]
In the interval between these messages during the delay the vm is frozen
and unresponsive. Any thoughts on why this minute long delay in the
migration is occurring?
Thanks much…