Re: [pve-devel] [PATCH v2 qemu-server++ 0/15] remote migration

Fabian Ebner Tue, 30 Nov 2021 06:07:02 -0800

Am 11.11.21 um 15:07 schrieb Fabian Grünbichler:

this series adds remote migration for VMs.


both live and offline migration including NBD and storage-migrated disks
should work.

Played around with it for a while. Biggest issue is that migration failsif there is no 'meta' property in the config. Most other things I wishfor are better error handling, but it seems to be in good shape otherwise!

Error "storage does not exist" if the real issue is missing accessrights. But that error also appears if missing access for/cluster/resources or if the target node does not exists.



For the 'config' command, 'Sys.Modify' seems to be required

failed to handle 'config' command - 403 Permission check failed (/,Sys.Modify)

but it does create an empty configuration file, leading to
    target_vmid: Guest with ID '5678' already exists on remote cluster
on the next attempt.

It also already allocates the disks, but doesn't clean them up, becauseit gets the wrong lock (since the config is empty) and aborts the 'quit'command.



If the config is not recent enough to have a 'meta' property:

failed to handle 'config' command - unable to parse value of 'meta'- got undefined value

Same issue with disk+config cleanup as above.


The local VM stayes locked with 'migrate'. Is that how it should be?

Also the __migration__ snapshot will stay around, resulting in an errorwhen trying to migrate again.

For live migration I always got a (cosmetic?) "WS closedunexpectedly"-error:

tunnel: -> sending command "quit" to remote
tunnel: <- got reply

tunnel: Tunnel tohttps://192.168.20.142:8006/api2/json/nodes/rob2/qemu/5678/mtunnelwebsocket?ticket=PVETUNNEL%3A<SNIP>&socket=%2Frun%2Fqemu-server%2F5678.mtunnelfailed - WS closed unexpectedly

2021-11-30 13:49:39 migration finished successfully (duration 00:01:02)
UPID:pve701:0000D8AD:000CB782:61A61DA5:qmigrate:111:root@pam:

Fun fact: the identity storage mapping will be used for storages thatdon't appear in the explicit mapping. E.g. it's possible to migrate a VMthat only has disks on storeA with --target-storage storeB:storeB (ifstoreA exists on the target of course). But the explicit identitymapping is prohibited.

When a target bridge is not present (should that be detected ahead ofstarting the migration?) and likely for any other startup failure theonly error in the log is:2021-11-30 14:43:10 ERROR: online migrate failure - error - tunnelcommand '{"cmd":"star<SNIP>

failed to handle 'start' command - start failed: QEMU exited with code 1

For non-remote migration we are more verbose in this case and log theQEMU output.

Can/should an interrupt be handled more gracefully, so that remotecleanup still happens?^CCMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed:interrupted by signal


2021-11-30 14:39:07 ERROR: interrupted by signal
2021-11-30 14:39:07 aborting phase 1 - cleanup resources
2021-11-30 14:39:08 ERROR: writing to tunnel failed: broken pipe

2021-11-30 14:39:08 ERROR: migration aborted (duration 00:00:10):interrupted by signal

besides lots of rebases, implemented todos and fixed issues the main
difference to the previous RFC is that we no longer define remote
entries in a config file, but just expect the caller/client to give us
all the required information to connect to the remote cluster.

new in v2: dropped parts already applied, incorporated Fabian's and
Dominik's feedback (thanks!)

overview over affected repos and changes, see individual patches for
more details.

proxmox-websocket-tunnel:

new tunnel helper tool for forwarding commands and data over websocket
connections, required by qemu-server on source side

pve-access-control:

new ticket type, required by qemu-server on target side

pve-guest-common:

handle remote migration (no SSH) in AbstractMigrate,
required by qemu-server

pve-storage:

extend 'pvesm import' to allow import from UNIX socket, required on
target node by qemu-server

qemu-server:

some refactoring, new mtunnel endpoints, new remote_migration endpoints
TODO: handle pending changes and snapshots
TODO: proper CLI for remote migration
potential TODO: precond endpoint?

pve-http-server:

fix for handling unflushed proxy streams

as usual, some of the patches are best viewed with '-w', especially in
qemu-server..

required dependencies are noted, qemu-server also requires a build-dep
on patched pve-common since the required options/formats would be
missing otherwise..
proxmox-websocket-tunnel

Fabian Grünbichler (4):
   initial commit
   add tunnel implementation
   add fingerprint validation
   add packaging

pve-access-control

Fabian Grünbichler (2):
   tickets: add tunnel ticket
   ticket: normalize path for verification

  src/PVE/AccessControl.pm | 52 ++++++++++++++++++++++++++++++----------
  1 file changed, 40 insertions(+), 12 deletions(-)

pve-http-server

Fabian Grünbichler (1):
   webproxy: handle unflushed write buffer

  src/PVE/APIServer/AnyEvent.pm | 10 ++++++----
  1 file changed, 6 insertions(+), 4 deletions(-)

qemu-server

Fabian Grünbichler (8):
   refactor map_storage to map_id
   schema: use pve-bridge-id
   update_vm: allow simultaneous setting of boot-order and dev
   nbd alloc helper: allow passing in explicit format
   mtunnel: add API endpoints
   migrate: refactor remote VM/tunnel start
   migrate: add remote migration handling
   api: add remote migrate endpoint

  PVE/API2/Qemu.pm   | 826 ++++++++++++++++++++++++++++++++++++++++++++-
  PVE/QemuMigrate.pm | 813 ++++++++++++++++++++++++++++++++++++--------
  PVE/QemuServer.pm  |  80 +++--
  debian/control     |   2 +
  4 files changed, 1539 insertions(+), 182 deletions(-)



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH v2 qemu-server++ 0/15] remote migration

Reply via email to