Am 18.02.26 um 16:45 schrieb Fiona Ebner:
> If the lock directory is not removed after failing because of a
> signal, it won't be possible to acquire the lock anymore before the
> 120 second timeout imposed on the lock by pmxcfs. This can easily
> happen by a second, unrelated task in production and is quite
> surprising. Install a signal handler that releases the lock if it was
> already acquired. If an old handler is defined, it is invoked,
> otherwise the signal is raised again. Just using 'die' would change
> the execution flow compared to before the change.
> 
> Signed-off-by: Fiona Ebner <[email protected]>
> ---
>  src/PVE/Cluster.pm | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
> index bdb465f..7165d1c 100644
> --- a/src/PVE/Cluster.pm
> +++ b/src/PVE/Cluster.pm
> @@ -615,6 +615,22 @@ my $cfs_lock = sub {
>  
>      my $is_code_err = 0;
>      eval {
> +        # catch signals to release the lock - further defer to old handler 
> if one was set
> +        my $old_sig;
> +        $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);

really a non-issue in practice and basically the same thing under the hood, but
this could probably just a map, something like (untested):

my $old_sig = { map { $_ => $SIG{$_} qw(INT TERM QUIT HUP PIPE) };

> +
> +        local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local 
> $SIG{HUP} =
> +            local $SIG{PIPE} = sub {
> +                my $signame = $_[0];
> +                rmdir $filename if $got_lock; # if we held the lock always 
> unlock again

Could be nice to output a warning if above rmdir fails?

> +                if ($old_sig->{$signame}) {
> +                    $old_sig->{$signame}->(@_);
> +                } else {
> +                    $SIG{$signame} = 'DEFAULT';
> +                    POSIX::raise($signame);

hmm, this reads alright, but then I'm wondering if it should be added elsewhere?
As I found not a single "POSIX::raise" or "raise\(" instance in our perl code
inside the /usr/share/perl5/{PVE,Proxmox} directories on a recent PVE 9 system, 
but
we have quite a few signal overrides, and while I did not checked those, I do 
believe
to remember that some of those fallback to the handler defined by the calling 
site.

Describing how exactly the code flow changes would be nice in any case.

> +                }
> +                die "interrupted by signal\n";
> +            };
>  
>          mkdir $lockdir;
>  



Reply via email to