On 14.12.20 14:00, Fabian Ebner wrote: > By remembering the instance via PID and start time and checking for that > information in later instances. If the old instance can't be found, the new > one > will continue and register itself in the state. > > After updating, if there is a waiting instance running the old version, one > more > might be created, because there is no instance_id yet. But the new instance > will > set the instance_id, which any later instance will see. > > More importantly, if the state is wrongly 'waiting' or 'syncing', e.g. > because an instance was terminated before finishing, we don't abort anymore > and > recover from the wrong state, thus fixing the bug. > > Signed-off-by: Fabian Ebner <f.eb...@proxmox.com> > --- > > I couldn't find a better unique identifier that can be easily verfied from > within another instance, but PID and start time should be good enough for the > intended purpose. > > Another alternative would be to introduce job-specific locking around the > whole > sync() block, but then we would have some three lock-level deep code... > > @Thomas: I felt like this was more complete than the "clear state after boot"- > solution, because it also works when the processes are killed for different > reasons than during shutdown.
that's true, and it seems like a quite nice and short approach to me, great! > > pve-zsync | 40 +++++++++++++++++++++++++++++++++++++++- > 1 file changed, 39 insertions(+), 1 deletion(-) > > diff --git a/pve-zsync b/pve-zsync > index f3b98c4..506bfab 100755 > --- a/pve-zsync > +++ b/pve-zsync > @@ -266,6 +266,7 @@ sub add_state_to_job { > $job->{state} = $state->{state}; > $job->{lsync} = $state->{lsync}; > $job->{vm_type} = $state->{vm_type}; > + $job->{instance_id} = $state->{instance_id}; > > for (my $i = 0; $state->{"snap$i"}; $i++) { > $job->{"snap$i"} = $state->{"snap$i"}; > @@ -365,6 +366,7 @@ sub update_state { > if ($job->{state} ne "del") { > $state->{state} = $job->{state}; > $state->{lsync} = $job->{lsync}; > + $state->{instance_id} = $job->{instance_id}; > $state->{vm_type} = $job->{vm_type}; > > for (my $i = 0; $job->{"snap$i"} ; $i++) { > @@ -584,6 +586,33 @@ sub destroy_job { > }); > } > > +sub get_process_start_time { > + my ($pid) = @_; > + > + return eval { run_cmd(['ps', '-o', 'lstart=', '-p', "$pid"]); }; instead of fork+exec do a much cheaper file read? I.e., copying over file_read_firstline from PVE::Tools then: sub get_process_start_time { my $stat_str = file_read_firstline("/proc/$pid/stat"); my $stat = [ split(/\s+/, $stat_str) ]; return $stat->[21]; } plus some error handling (note I did not test above) > +} > + > +sub get_instance_id { > + my ($pid) = @_; > + > + my $starttime = get_process_start_time($pid) > + or die "could not determine start time for process '$pid'\n"; > + > + return "${pid}:${starttime}"; > +} > + > +sub instance_exists { > + my ($instance_id) = @_; > + > + if (defined($instance_id) && $instance_id =~ m/^([1-9][0-9]*):(.*)$/) { > + my ($pid, $starttime) = ($1, $2); > + my $actual_starttime = get_process_start_time($pid); > + return defined($actual_starttime) && $starttime eq $actual_starttime; > + } > + > + return 0; > +} > + > sub sync { > my ($param) = @_; > > @@ -593,11 +622,18 @@ sub sync { > eval { $job = get_job($param) }; > > if ($job) { > - if (defined($job->{state}) && ($job->{state} eq "syncing" || > $job->{state} eq "waiting")) { > + my $state = $job->{state} // 'ok'; > + $state = 'ok' if !instance_exists($job->{instance_id}); > + > + if ($state eq "syncing" || $state eq "waiting") { > die "Job --source $param->{source} --name $param->{name} is > already scheduled to sync\n"; > } > > $job->{state} = "waiting"; > + > + eval { $job->{instance_id} = get_instance_id($$); }; I'd query and cache the local instance ID from the current process on startup, this would have the nice side effect of avoiding error potential here completely > + warn "Could not set instance ID - $@" if $@; > + > update_state($job); > } > }); > @@ -671,6 +707,7 @@ sub sync { > eval { $job = get_job($param); }; > if ($job) { > $job->{state} = "error"; > + delete $job->{instance_id}; > update_state($job); > } > }); > @@ -687,6 +724,7 @@ sub sync { > $job->{state} = "ok"; > } > $job->{lsync} = $date; > + delete $job->{instance_id}; > update_state($job); > } > }); > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel