On December 16, 2022 2:36 pm, Daniel Tschlatscher wrote: > In some cases the VM API start method would return before the detached > KVM process would have exited. This is especially problematic with HA, > because the HA manager would think the VM started successfully, later > see that it exited and start it again in an endless loop. > > Moreover, another case exists when resuming a hibernated VM. In this > case, the qemu thread will attempt to load the whole vmstate into > memory before exiting. > Depending on vmstate size, disk read speed, and similar factors this > can take quite a while though and it is not possible to start the VM > normally during this time. > > To get around this, this patch intercepts the error, looks whether a > corresponding KVM thread is still running, and waits for/kills it, > before continuing. > > Signed-off-by: Daniel Tschlatscher <d.tschlatsc...@proxmox.com> > --- > > Changes from v2: > * Rebased to current master > * Changed warn to use 'log_warn' instead > * Reworded log message when waiting for lingering qemu process > > PVE/QemuServer.pm | 40 +++++++++++++++++++++++++++++++++------- > 1 file changed, 33 insertions(+), 7 deletions(-) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index 2adbe3a..f63dc3f 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -5884,15 +5884,41 @@ sub vm_start_nolock { > $tpmpid = start_swtpm($storecfg, $vmid, $tpm, $migratedfrom); > } > > - my $exitcode = run_command($cmd, %run_params); > - if ($exitcode) { > - if ($tpmpid) { > - warn "stopping swtpm instance (pid $tpmpid) due to QEMU > startup error\n"; > - kill 'TERM', $tpmpid; > + eval { > + my $exitcode = run_command($cmd, %run_params); > + > + if ($exitcode) { > + if ($tpmpid) { > + log_warn "stopping swtpm instance (pid $tpmpid) due to > QEMU startup error\n";
this warn -> log_warn change kind of slipped in, it's not really part of this patch? > + kill 'TERM', $tpmpid; > + } > + die "QEMU exited with code $exitcode\n"; > } > - die "QEMU exited with code $exitcode\n"; > + }; > + > + if (my $err = $@) { > + my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid); > + > + if ($pid ne "") { can be combined: if (my $pid = ...) { } (empty string evaluates to false in perl ;)) > + my $count = 0; > + my $timeout = 300; > + > + print "Waiting $timeout seconds for detached qemu process > $pid to exit\n"; > + while (($count < $timeout) && > + PVE::QemuServer::Helpers::vm_running_locally($vmid)) { > + $count++; > + sleep(1); > + } > + either here > + if ($count >= $timeout) { > + log_warn "Reached timeout. Terminating now with > SIGKILL\n"; or here, recheck that VM is still running and still has the same PID, and log accordingly instead of KILLing if not.. the same is also true in _do_vm_stop > + kill(9, $pid); > + } > + } > + > + die $err; > } > - }; > + } > }; > > if ($conf->{hugepages}) { > -- > 2.30.2 > > > > _______________________________________________ > pve-devel mailing list > pve-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel