Am 11.11.21 um 15:07 schrieb Fabian Grünbichler:
this series adds remote migration for VMs.

both live and offline migration including NBD and storage-migrated disks
should work.


Played around with it for a while. Biggest issue is that migration fails if there is no 'meta' property in the config. Most other things I wish for are better error handling, but it seems to be in good shape otherwise!


Error "storage does not exist" if the real issue is missing access rights. But that error also appears if missing access for /cluster/resources or if the target node does not exists.


For the 'config' command, 'Sys.Modify' seems to be required
failed to handle 'config' command - 403 Permission check failed (/, Sys.Modify)
but it does create an empty configuration file, leading to
    target_vmid: Guest with ID '5678' already exists on remote cluster
on the next attempt.
It also already allocates the disks, but doesn't clean them up, because it gets the wrong lock (since the config is empty) and aborts the 'quit' command.


If the config is not recent enough to have a 'meta' property:
failed to handle 'config' command - unable to parse value of 'meta' - got undefined value
Same issue with disk+config cleanup as above.


The local VM stayes locked with 'migrate'. Is that how it should be?
Also the __migration__ snapshot will stay around, resulting in an error when trying to migrate again.


For live migration I always got a (cosmetic?) "WS closed unexpectedly"-error:
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
tunnel: Tunnel to https://192.168.20.142:8006/api2/json/nodes/rob2/qemu/5678/mtunnelwebsocket? ticket=PVETUNNEL%3A<SNIP>&socket=%2Frun%2Fqemu-server%2F5678.mtunnel failed - WS closed unexpectedly
2021-11-30 13:49:39 migration finished successfully (duration 00:01:02)
UPID:pve701:0000D8AD:000CB782:61A61DA5:qmigrate:111:root@pam:


Fun fact: the identity storage mapping will be used for storages that don't appear in the explicit mapping. E.g. it's possible to migrate a VM that only has disks on storeA with --target-storage storeB:storeB (if storeA exists on the target of course). But the explicit identity mapping is prohibited.


When a target bridge is not present (should that be detected ahead of starting the migration?) and likely for any other startup failure the only error in the log is: 2021-11-30 14:43:10 ERROR: online migrate failure - error - tunnel command '{"cmd":"star<SNIP>
failed to handle 'start' command - start failed: QEMU exited with code 1
For non-remote migration we are more verbose in this case and log the QEMU output.


Can/should an interrupt be handled more gracefully, so that remote cleanup still happens? ^CCMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed: interrupted by signal

2021-11-30 14:39:07 ERROR: interrupted by signal
2021-11-30 14:39:07 aborting phase 1 - cleanup resources
2021-11-30 14:39:08 ERROR: writing to tunnel failed: broken pipe
2021-11-30 14:39:08 ERROR: migration aborted (duration 00:00:10): interrupted by signal


besides lots of rebases, implemented todos and fixed issues the main
difference to the previous RFC is that we no longer define remote
entries in a config file, but just expect the caller/client to give us
all the required information to connect to the remote cluster.

new in v2: dropped parts already applied, incorporated Fabian's and
Dominik's feedback (thanks!)

overview over affected repos and changes, see individual patches for
more details.

proxmox-websocket-tunnel:

new tunnel helper tool for forwarding commands and data over websocket
connections, required by qemu-server on source side

pve-access-control:

new ticket type, required by qemu-server on target side

pve-guest-common:

handle remote migration (no SSH) in AbstractMigrate,
required by qemu-server

pve-storage:

extend 'pvesm import' to allow import from UNIX socket, required on
target node by qemu-server

qemu-server:

some refactoring, new mtunnel endpoints, new remote_migration endpoints
TODO: handle pending changes and snapshots
TODO: proper CLI for remote migration
potential TODO: precond endpoint?

pve-http-server:

fix for handling unflushed proxy streams

as usual, some of the patches are best viewed with '-w', especially in
qemu-server..

required dependencies are noted, qemu-server also requires a build-dep
on patched pve-common since the required options/formats would be
missing otherwise..
proxmox-websocket-tunnel

Fabian Grünbichler (4):
   initial commit
   add tunnel implementation
   add fingerprint validation
   add packaging

pve-access-control

Fabian Grünbichler (2):
   tickets: add tunnel ticket
   ticket: normalize path for verification

  src/PVE/AccessControl.pm | 52 ++++++++++++++++++++++++++++++----------
  1 file changed, 40 insertions(+), 12 deletions(-)

pve-http-server

Fabian Grünbichler (1):
   webproxy: handle unflushed write buffer

  src/PVE/APIServer/AnyEvent.pm | 10 ++++++----
  1 file changed, 6 insertions(+), 4 deletions(-)

qemu-server

Fabian Grünbichler (8):
   refactor map_storage to map_id
   schema: use pve-bridge-id
   update_vm: allow simultaneous setting of boot-order and dev
   nbd alloc helper: allow passing in explicit format
   mtunnel: add API endpoints
   migrate: refactor remote VM/tunnel start
   migrate: add remote migration handling
   api: add remote migrate endpoint

  PVE/API2/Qemu.pm   | 826 ++++++++++++++++++++++++++++++++++++++++++++-
  PVE/QemuMigrate.pm | 813 ++++++++++++++++++++++++++++++++++++--------
  PVE/QemuServer.pm  |  80 +++--
  debian/control     |   2 +
  4 files changed, 1539 insertions(+), 182 deletions(-)



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to