Applied whole series. On Fri, Aug 04, 2017 at 02:53:57PM +0200, Fabian Grünbichler wrote: > this patch series attempts to reduce the downtime occuring during > live-migration of VMs to sane levels by > - conditionalizing potentially unneeded SSH connections > - replacing commands over SSH with new 'qm mtunnel' commands > - reducing the polling interval to notice a completed migration faster > > attempts to monitor down time via ping produced rather unreliable results, > probably cause of ARP? but old to old is reliable slowest there too.. > > following are durations in 'paused' state, between 'paused inmigrate' and > 'running', measured with qmp status with 0.1 sleep inbetween, tests repeated 5 > times each on a network-rate-limited virtual cluster. > > with old polling, 2G RAM (actual RAM transfer in <2s, so no auto-reduction of > polling interval happens): > > old code: average 3.2s > new to old: average 1.6s (skips pvesr set-state) > new to new: average 1.2s > > with old polling, 8G RAM (auto-reduction of polling interval kicks in, > slightly better results): > > old code: average 2.7s > new to old: 1s > new to new: 0.7s > > with reduced polling interval (last patch applied), 2G and 8G RAM: > new to old: 0.4s > new to new: one single instance of logged paused state over 5 migrations! > > with reduced polling interval, 8G RAM, old code but with last patch applied: > 2s > > so it seems like this is the right combination of changes to get downtime back > to acceptable levels without sacrificing consistency. > > commands which might be integrated into mtunnel as well in the future: > -pvesr set-state > -qm nbdstop > -qm unlock > > changes from v1, based on Thomas' feedback: > > ------8<------8<------8<------8<------8<------8<------ > > diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm > index 1792cb0..5dce10f 100755 > --- a/PVE/CLI/qm.pm > +++ b/PVE/CLI/qm.pm > @@ -273,7 +273,7 @@ __PACKAGE__->register_method ({ > }; > > $tunnel_write->("tunnel online"); > - $tunnel_write->("ver 1.0"); > + $tunnel_write->("ver 1"); > > while (my $line = <>) { > chomp $line; > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm > index ac9ac22..fc847cc 100644 > --- a/PVE/QemuMigrate.pm > +++ b/PVE/QemuMigrate.pm > @@ -124,7 +124,7 @@ sub write_tunnel { > }; > die "writing to tunnel failed: $@\n" if $@; > > - if ($tunnel->{version} && $tunnel->{version} >= 1.0) { > + if ($tunnel->{version} && $tunnel->{version} >= 1) { > my $res = eval { $self->read_tunnel($tunnel, 10); }; > die "no reply to command '$command': $@\n" if $@; > > @@ -156,9 +156,12 @@ sub fork_tunnel { > > eval { > my $ver = $self->read_tunnel($tunnel, 10); > - $ver =~ /^ver (\d+\.\d+)$/; > - $tunnel->{version} = $1 if $1; > - $self->log('info', "ssh tunnel version: $tunnel->{version}\n"); > + if ($ver =~ /^ver (\d+)$/) { > + $tunnel->{version} = $1; > + $self->log('info', "ssh tunnel $ver\n"); > + } else { > + $err = "received invalid tunnel version string '$ver'\n" if !$err; > + } > }; > > if ($err) { > @@ -923,7 +926,7 @@ sub phase3_cleanup { > die "Failed to move config to node '$self->{node}' - rename failed: $!\n" > if !rename($conffile, $newconffile); > > - $self->switch_replication_job_target() if $self->{replicated_volumes};; > + $self->switch_replication_job_target() if $self->{replicated_volumes}; > > if ($self->{livemigration}) { > if ($self->{storage_migration}) { > @@ -943,7 +946,7 @@ sub phase3_cleanup { > } > > # config moved and nbd server stopped - now we can resume vm on target > - if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1.0) { > + if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1) { > eval { > $self->write_tunnel($tunnel, 30, "resume $vmid"); > }; > @@ -953,13 +956,11 @@ sub phase3_cleanup { > } > } else { > my $cmd = [@{$self->{rem_ssh}}, 'qm', 'resume', $vmid, > '--skiplock', '--nocheck']; > - eval { > - my $logf = sub { > - my $line = shift; > - $self->log('err', $line); > - }; > - PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => > $logf); > + my $logf = sub { > + my $line = shift; > + $self->log('err', $line); > }; > + eval { PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => > $logf); }; > if (my $err = $@) { > $self->log('err', $err); > $self->{errors} = 1; > > ------>8------>8------>8------>8------>8------>8------ > > Fabian Grünbichler (10): > migrate: switch back to qm mtunnel > migrate: refactor mtunnel read/write > qm mtunnel: add tunnel version > migrate: read mtunnel version > qm mtunnel: add write helper > mtunnel: add and handle OK/ERR replies > qm mtunnel/migrate: add resume VMID command > migrate: finish tunnel in phase 3 > migrate: keep track of replication > migrate: reduce polling intervals > > PVE/CLI/qm.pm | 28 ++++++++++-- > PVE/QemuMigrate.pm | 127 > ++++++++++++++++++++++++++++++++++++++--------------- > 2 files changed, 117 insertions(+), 38 deletions(-) > > -- > 2.11.0
_______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel