Dean Cording wrote:
I've come across an issue with the way that rdiff-backup ensures that only one
server is accessing a backup dataset.
...
Recently I had a backup fail, probably because of a network outage. All
subsequent backups refuse to run because rdiff-backup believes the failed rdiff-
backup instance is still running - even though this is clearly impossible
because it is a totally different instance of the virtual server.
This had me stumped for a while but I finally figured out what is happening.
Because I start a new virtual server instance each time and I run the backup
from a script, everything happens in a consistent order. As a result the
instance of rdiff-backup running on the server for each backup session almost
always has the same PID. So when a backup fails, the subsequent backup looks
at the metadata, finds the PID of the failed backup and sees that that PID is
still running - not realising that the other instance is actually itself.
A cursory look at regress.py seems to confirm this behavior:
Specifically in check_pids() it says:
if pid is not None and pid_running(pid):
This could say:
if pid is not None and pid is not os.getpid() and pid_running(pid):
I'm not sure of a way of working around this problem as the virtual machine is
always started from a known state and hasn't been running long enough to build
up any entropy to generate unique random numbers between different sessions.
The current time adds a little randomness. A silly workaround would be
to call the following perl script before running rdiff-backup:
#!/usr/bin/perl
`/bin/true` for 0..int(rand(100));
This will increase the pid and should stop your job from failing
continuously.
Steven
_______________________________________________
rdiff-backup-users mailing list at [email protected]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki