On Wed, Apr 30, 2025 at 11:23 AM Paul B. Henson via Users < users@lists.libvirt.org> wrote:
> I'm using libvirt under Debian 12 (9.0.0-4+deb12u2 w/qemu > 7.2+dfsg-7+deb12u12). > > I have a vm using sr-iov, and configured it with a failover macvtap > interface so I could live migrate it. However, there is a significant > delay at the end of the migration resulting in a lot of lost traffic. If > I only have the macvtap interface, migration completes immediately at > the end of the transfer of memory with no loss of traffic. > > I enabled debug logging, and found the following. On the source system, > it logs that the system is paused for the cutover: > > 2025-04-30 01:08:12.526+0000: 1696180: debug : > qemuMigrationAnyCompleted:1957 : Migration paused before switchover > > at that point, for almost a minute, the source system just keeps > printing the same statistics: > > 2025-04-30 01:08:12.923+0000: 1696272: info : > qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_REPLY: > mon=0x7f8fdc0ad2f0 reply={"return": {"expected-downtime": 300, "status": > "device", "setup-time": 297, "total-time": 26107, "ram": {"total": > 137452265472, "postcopy-requests": 0, "dirty-sync-count": 3, > "multifd-bytes": 2821784576, "pages-per-second": 297855, > "downtime-bytes": 13208, "page-size": 4096, "remaining": 0, > "postcopy-bytes": 0, "mbps": 9786.9158461538464, "transferred": > 3117658825, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": > 295861041, "duplicate": 32874480, "dirty-pages-rate": 56, "skipped": 0, > "normal-bytes": 2804301824, "normal": 684644}}, "id": "libvirt-577"} > [...] > 2025-04-30 01:09:06.290+0000: 1696272: info : > qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_REPLY: > mon=0x7f8fdc0ad2f0 reply={"return": {"expected-downtime": 300, "status": > "device", "setup-time": 297, "total-time" > : 79474, "ram": {"total": 137452265472, "postcopy-requests": 0, > "dirty-sync-count": 3, "multifd-bytes": 2821784576, "pages-per-second": > 297855, "downtime-bytes": 13208, "page-size": 4096, "remaining": 0, > "postcopy-bytes" > : 0, "mbps": 9786.9158461538464, "transferred": 3117658825, > "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 295861041, > "duplicate": 32874480, "dirty-pages-rate": 56, "skipped": 0, > "normal-bytes": 2804301824, "normal": > 684644}}, "id": "libvirt-629"} > > until finally it completes: > > 2025-04-30 01:09:06.327+0000: 1696272: info : > qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_EVENT: > mon=0x7f8fdc0ad2f0 event={"timestamp": {"seconds": 1745975346, > "microseconds": 327382}, "event": "MIGRATION", "dat > a": {"status": "completed"}} > > > On the destination side, it says something about negotiating failover > for the network link: > > 2025-04-30 01:08:12.923+0000: 1384503: info : > qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_EVENT: mon= > 0x7fc7900ab2f0 event={"timestamp": {"seconds": 1745975292, > "microseconds": 922783}, "event": "FAILOVER_NEGOTIA > TED", "data": {"device-id": "ua-sr-iov-backup"}} > > Then nothing happens for about a minute until it says it is done: > > 2025-04-30 01:09:06.328+0000: 1384503: debug : > qemuMonitorJSONIOProcessLine:189 : Line [{"timestamp": {"second > s": 1745975346, "microseconds": 327991}, "event": "MIGRATION", "data": > {"status": "completed"}}] > > > Any thoughts on what is going on here to cause this delay? It's clearly > somehow related to the sv-iov component of the migration. > What is your model of sr-iov cards? And what is the the XML configration for sr-iov? The additional downtime could be caused by VFIO migrations. The downtime of VFIO has been reduced after libvirt-10.5.0( https://gitlab.com/libvirt/libvirt/-/commit/1cc7737f69 ) and QEMU 8.1 ( https://github.com/qemu/qemu/blob/master/qapi/migration.json#L462) You can try to update to these versions or above to see if the downtime is improved. > > Thanks much… > >