I've come across an issue with the way that rdiff-backup ensures that only one server is accessing a backup dataset.
When rdiff-backup starts it checks the metadata to see if another instance of rdiff-backup is performing a backup. If it finds one then it checks the PID to see if the other instance is still running and if not, assumes the other instance has crashed so it regresses the previous incomplete backup. I am running rdiff-backup on Amazon's cloud computing resource. Whenever I want to backup I start a new virtual server, run rdiff-backup and then shut it down. Recently I had a backup fail, probably because of a network outage. All subsequent backups refuse to run because rdiff-backup believes the failed rdiff- backup instance is still running - even though this is clearly impossible because it is a totally different instance of the virtual server. This had me stumped for a while but I finally figured out what is happening. Because I start a new virtual server instance each time and I run the backup from a script, everything happens in a consistent order. As a result the instance of rdiff-backup running on the server for each backup session almost always has the same PID. So when a backup fails, the subsequent backup looks at the metadata, finds the PID of the failed backup and sees that that PID is still running - not realising that the other instance is actually itself. I'm not sure of a way of working around this problem as the virtual machine is always started from a known state and hasn't been running long enough to build up any entropy to generate unique random numbers between different sessions. _______________________________________________ rdiff-backup-users mailing list at [email protected] http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
