Am 11.11.21 um 15:07 schrieb Fabian Grünbichler:
this series adds remote migration for VMs.
both live and offline migration including NBD and storage-migrated disks
should work.
Played around with it for a while. Biggest issue is that migration fails
if there is no 'meta' property in the config. Most other things I wish
for are better error handling, but it seems to be in good shape otherwise!
Error "storage does not exist" if the real issue is missing access
rights. But that error also appears if missing access for
/cluster/resources or if the target node does not exists.
For the 'config' command, 'Sys.Modify' seems to be required
failed to handle 'config' command - 403 Permission check failed (/,
Sys.Modify)
but it does create an empty configuration file, leading to
target_vmid: Guest with ID '5678' already exists on remote cluster
on the next attempt.
It also already allocates the disks, but doesn't clean them up, because
it gets the wrong lock (since the config is empty) and aborts the 'quit'
command.
If the config is not recent enough to have a 'meta' property:
failed to handle 'config' command - unable to parse value of 'meta'
- got undefined value
Same issue with disk+config cleanup as above.
The local VM stayes locked with 'migrate'. Is that how it should be?
Also the __migration__ snapshot will stay around, resulting in an error
when trying to migrate again.
For live migration I always got a (cosmetic?) "WS closed
unexpectedly"-error:
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
tunnel: Tunnel to
https://192.168.20.142:8006/api2/json/nodes/rob2/qemu/5678/mtunnelwebsocket?
ticket=PVETUNNEL%3A<SNIP>&socket=%2Frun%2Fqemu-server%2F5678.mtunnel
failed - WS closed unexpectedly
2021-11-30 13:49:39 migration finished successfully (duration 00:01:02)
UPID:pve701:0000D8AD:000CB782:61A61DA5:qmigrate:111:root@pam:
Fun fact: the identity storage mapping will be used for storages that
don't appear in the explicit mapping. E.g. it's possible to migrate a VM
that only has disks on storeA with --target-storage storeB:storeB (if
storeA exists on the target of course). But the explicit identity
mapping is prohibited.
When a target bridge is not present (should that be detected ahead of
starting the migration?) and likely for any other startup failure the
only error in the log is:
2021-11-30 14:43:10 ERROR: online migrate failure - error - tunnel
command '{"cmd":"star<SNIP>
failed to handle 'start' command - start failed: QEMU exited with code 1
For non-remote migration we are more verbose in this case and log the
QEMU output.
Can/should an interrupt be handled more gracefully, so that remote
cleanup still happens?
^CCMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed:
interrupted by signal
2021-11-30 14:39:07 ERROR: interrupted by signal
2021-11-30 14:39:07 aborting phase 1 - cleanup resources
2021-11-30 14:39:08 ERROR: writing to tunnel failed: broken pipe
2021-11-30 14:39:08 ERROR: migration aborted (duration 00:00:10):
interrupted by signal
besides lots of rebases, implemented todos and fixed issues the main
difference to the previous RFC is that we no longer define remote
entries in a config file, but just expect the caller/client to give us
all the required information to connect to the remote cluster.
new in v2: dropped parts already applied, incorporated Fabian's and
Dominik's feedback (thanks!)
overview over affected repos and changes, see individual patches for
more details.
proxmox-websocket-tunnel:
new tunnel helper tool for forwarding commands and data over websocket
connections, required by qemu-server on source side
pve-access-control:
new ticket type, required by qemu-server on target side
pve-guest-common:
handle remote migration (no SSH) in AbstractMigrate,
required by qemu-server
pve-storage:
extend 'pvesm import' to allow import from UNIX socket, required on
target node by qemu-server
qemu-server:
some refactoring, new mtunnel endpoints, new remote_migration endpoints
TODO: handle pending changes and snapshots
TODO: proper CLI for remote migration
potential TODO: precond endpoint?
pve-http-server:
fix for handling unflushed proxy streams
as usual, some of the patches are best viewed with '-w', especially in
qemu-server..
required dependencies are noted, qemu-server also requires a build-dep
on patched pve-common since the required options/formats would be
missing otherwise..
proxmox-websocket-tunnel
Fabian Grünbichler (4):
initial commit
add tunnel implementation
add fingerprint validation
add packaging
pve-access-control
Fabian Grünbichler (2):
tickets: add tunnel ticket
ticket: normalize path for verification
src/PVE/AccessControl.pm | 52 ++++++++++++++++++++++++++++++----------
1 file changed, 40 insertions(+), 12 deletions(-)
pve-http-server
Fabian Grünbichler (1):
webproxy: handle unflushed write buffer
src/PVE/APIServer/AnyEvent.pm | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
qemu-server
Fabian Grünbichler (8):
refactor map_storage to map_id
schema: use pve-bridge-id
update_vm: allow simultaneous setting of boot-order and dev
nbd alloc helper: allow passing in explicit format
mtunnel: add API endpoints
migrate: refactor remote VM/tunnel start
migrate: add remote migration handling
api: add remote migrate endpoint
PVE/API2/Qemu.pm | 826 ++++++++++++++++++++++++++++++++++++++++++++-
PVE/QemuMigrate.pm | 813 ++++++++++++++++++++++++++++++++++++--------
PVE/QemuServer.pm | 80 +++--
debian/control | 2 +
4 files changed, 1539 insertions(+), 182 deletions(-)
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel