Else this options is not really useful. First, sending a SIGTERM lets the children exit, not quite what "leave_children_open_on_reload" promises.
The problem this causes is that we may get a time window where no worker is active and thus, for example, our API daemon would not accept connections during a restart (or better said, reload). So, don't request termination of any child worker, if this option is set, but rather just restart (re-exec) ourself, startup a new set of workers and only then request the termination of the old ones, allowing a fully seamless reload. This is only done on `$daemon-exe restart` and thus on `systemctl reload $daemon`, systemctl restart or any other stop start cycles always exit all other workers first. This expects that the worker can do a graceful termination on SIGTERM, which is already the case for anything using our AnyEvent based class (which is base of our HTTPServer module). With graceful termination is meant the following: the worker accepts no new work and exits immediately after the current queued work is done. Signed-off-by: Thomas Lamprecht <t.lampre...@proxmox.com> --- src/PVE/Daemon.pm | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/src/PVE/Daemon.pm b/src/PVE/Daemon.pm index 9d72c32..a6b58d1 100644 --- a/src/PVE/Daemon.pm +++ b/src/PVE/Daemon.pm @@ -184,6 +184,19 @@ my $start_workers = sub { } }; +my $terminate_old_workers = sub { + my ($self) = @_; + + my $cpids = [ keys %{$self->{old_workers}} ]; + + return if !($cpids && scalar(@$cpids) > 0); + + # request graceful exit, no need for waitpid we have a SIGCHLD handler + foreach my $cpid (@$cpids) { + kill 15 => $cpid; + } +}; + my $terminate_server = sub { my ($self, $allow_open_children) = @_; @@ -198,19 +211,13 @@ my $terminate_server = sub { eval { $self->shutdown(); }; warn $@ if $@; - # we have workers - send TERM signal - - foreach my $cpid (keys %{$self->{workers}}) { - kill(15, $cpid); # TERM childs - } # if configured, leave children running on HUP - return if $allow_open_children && - $self->{leave_children_open_on_reload}; + return if $allow_open_children && $self->{leave_children_open_on_reload}; - # else, send TERM to old workers - foreach my $cpid (keys %{$self->{old_workers}}) { - kill(15, $cpid); # TERM childs + # else send TERM to all (old and current) child workers + foreach my $cpid (keys %{$self->@{'workers','old_workers'}}) { + kill(15, $cpid); } # nicely shutdown childs (give them max 10 seconds to shut down) @@ -395,13 +402,11 @@ my $server_run = sub { &$old_sig_chld(@_) if $old_sig_chld; }; - # catch worker finished during restart phase - &$finish_workers($self); - # now loop forever (until we receive terminate signal) for (;;) { &$start_workers($self); sleep(5); + &$terminate_old_workers($self); &$finish_workers($self); last if $self->{terminate}; } -- 2.11.0 _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel