I live-migrated 300 vms with:
migration: insecure
max_workers: 30
and 10 parallel workers
(as described here https://forum.proxmox.com/threads/live-migration.127355/#post-557181)

Had zero issues with the patch applied,
without the patch i had ~30 errors

Tested-by: Hannes Duerr <h.du...@proxmox.com>

On 12/20/23 13:32, Thomas Lamprecht wrote:
On 19/12/2023 14:44, Fiona Ebner wrote:
Currently, volume activation, PCI reservation and resetting systemd
scope happen in between, so the 5 second expiretime used for port
reservation is not always enough.

It's possible to defer telling QEMU where it should listen for
migration and do so after it has been started via QMP. Therefore, the
port reservation can be moved very close to the actual usage.

Mentioned here for completeness and can still be done as an additional
change later if desired: next_migrate_port could be modified to
optionally return the open socket and it should be possible to pass
the file descriptor directly to QEMU, but that would require accepting
the connection before on the Perl side (otherwise leads to ENOTCONN
107). While it would avoid any races, it's not the most elegant
and the change at hand should be enough in all practical situations.

Signed-off-by: Fiona Ebner <f.eb...@proxmox.com>
---

Discussion for v1:
https://lists.proxmox.com/pipermail/pve-devel/2023-November/060149.html

Changes in v2:
     * move reservation+usage much closer together than was done in v1
       of the qemu-server patch
     * drop other partial fix attempts for pve-common
I find this approach more than just an OK'ish stop-gap, this should
fix most such issues we can have in practice.

If you can get someone to additionally test this it's fine to apply as
is IMO.

The one thing that might be worse (didn't check reservation logic)
compared to FD passing is when there would be no migration ports
available, as then we would have already spend slightly more time and
resources by having the VM already started. We could side-step this a
bit by looping for requesting a reserved port for a few seconds.

But IMO it's not highly likely to run out of such ports, most actions
that can spawn multiple migrations (like HA) are capped by default.

So once tested a few general migration situations, consider this:

Acked-by: Thomas Lamprecht <t.lampre...@proxmox.com>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to