If the lock directory is not removed after failing because of a
signal, it won't be possible to acquire the lock anymore before the
120 second timeout imposed on the lock by pmxcfs. This can easily
happen by a second, unrelated task in production and is quite
surprising. Install a signal handler that releases the lock if it was
already acquired. If an old handler is defined, it is invoked,
otherwise the signal is raised again. Just using 'die' would change
the execution flow compared to before the change.

Signed-off-by: Fiona Ebner <[email protected]>
---
 src/PVE/Cluster.pm | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index bdb465f..7165d1c 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -615,6 +615,22 @@ my $cfs_lock = sub {
 
     my $is_code_err = 0;
     eval {
+        # catch signals to release the lock - further defer to old handler if 
one was set
+        my $old_sig;
+        $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);
+
+        local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local 
$SIG{HUP} =
+            local $SIG{PIPE} = sub {
+                my $signame = $_[0];
+                rmdir $filename if $got_lock; # if we held the lock always 
unlock again
+                if ($old_sig->{$signame}) {
+                    $old_sig->{$signame}->(@_);
+                } else {
+                    $SIG{$signame} = 'DEFAULT';
+                    POSIX::raise($signame);
+                }
+                die "interrupted by signal\n";
+            };
 
         mkdir $lockdir;
 
-- 
2.47.3




Reply via email to