Because of a systemd issue [0], when a service that's 'partOf' a scope
fails, the scope itself might end up being left-over, even after all
processes in the scope exit. In particular, this can happen for the
'$vmid.scope' when the 'pve-dbus-vmstate@$vmid.service' fails.

Doing a 'reset-failed' of the failed 'partOf' service leads to the
left-over scope being cleaned up too. Without that users in that
situation would get a difficult-to-make-sense-of "timeout waiting on
systemd" error message.

[0]: https://github.com/systemd/systemd/issues/39141

Signed-off-by: Fiona Ebner <[email protected]>
---
 src/PVE/QemuServer.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index 7d5ab718..8e2f03dc 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -5802,6 +5802,12 @@ sub vm_start_nolock {
     }
 
     my %silence_std_outs = (outfunc => sub { }, errfunc => sub { });
+    eval { # See systemd GH #39141, need to reset failed PartOf units too, or 
scope might be blocked
+        run_command(
+            ['/bin/systemctl', 'reset-failed', 
"pve-dbus-vmstate\@$vmid.service"],
+            %silence_std_outs,
+        );
+    };
     eval { run_command(['/bin/systemctl', 'reset-failed', "$vmid.scope"], 
%silence_std_outs) };
     eval { run_command(['/bin/systemctl', 'stop', "$vmid.scope"], 
%silence_std_outs) };
     # Issues with the above 'stop' not being fully completed are extremely 
rare, a very low
-- 
2.47.3



_______________________________________________
pve-devel mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to