(second post after bounce) I found a solution, but have not fully characterized the problem. The rdiff-backup destination node is part of a cluster. The cluster frontend was down during the failures. It provides services to the client nodes, namely DNS and NTP.
With the frontend down I did the experiment of fixing the dns on the rdiff-backup destination node by pointing it to another DNS server. Then I and retried the rdiff-backup from the source node (which was fine). The rdiff-backup failed again in the same way. When I turned on the frontend the only service I can think of that would matter was NTP. I waited several minutes until the ntp had synched then retried the rdiff-backup. It succeeded normally, from all nodes. Is time sensitivity a known behavior of rdiff-backup? Thanks, Federico _____________________________________________ From: Sacerdoti, Federico Sent: Monday, November 06, 2006 11:30 AM To: Sacerdoti, Federico; '[email protected]' Subject: RE: strange traceback Here is a nicer traceback of the problem. Perhaps the old logs should be renamed before the regression is attempted? Thanks, Federico Previous backup seems to have failed, regressing destination now. Traceback (most recent call last): File "/usr/bin/rdiff-backup", line 23, in ? rdiff_backup.Main.Main(sys.argv[1:]) File "/usr/lib64/python2.3/site-packages/rdiff_backup/Main.py", line 285, in Main take_action(rps) File "/usr/lib64/python2.3/site-packages/rdiff_backup/Main.py", line 255, in take_action elif action == "backup": Backup(rps[0], rps[1]) File "/usr/lib64/python2.3/site-packages/rdiff_backup/Main.py", line 305, in Backup backup.Mirror_and_increment(rpin, rpout, incdir) File "/usr/lib64/python2.3/site-packages/rdiff_backup/backup.py", line 51, in Mirror_and_increment DestS.patch_and_increment(dest_rpath, source_diffiter, inc_rpath) File "/usr/lib64/python2.3/site-packages/rdiff_backup/backup.py", line 229, in patch_and_increment ITR(diff.index, diff) File "/usr/lib64/python2.3/site-packages/rdiff_backup/rorpiter.py", line 285, in __call__ last_branch.fast_process(*args) File "/usr/lib64/python2.3/site-packages/rdiff_backup/backup.py", line 617, in fast_process if self.patch_to_temp(rp, diff_rorp, tf): File "/usr/lib64/python2.3/site-packages/rdiff_backup/backup.py", line 511, in patch_to_temp (diff_rorp, new)) == 0: return 0 File "/usr/lib64/python2.3/site-packages/rdiff_backup/robust.py", line 39, in check_common_error if error_handler: return error_handler(exc, *args) File "/usr/lib64/python2.3/site-packages/rdiff_backup/robust.py", line 71, in error_handler log.ErrorLog.write_if_open(error_type, rp, exc) File "/usr/lib64/python2.3/site-packages/rdiff_backup/log.py", line 253, in write_if_open if cls.isopen(): cls.write(error_type, rp, exc) File "/usr/lib64/python2.3/site-packages/rdiff_backup/log.py", line 234, in write Log(s, 2) File "/usr/lib64/python2.3/site-packages/rdiff_backup/log.py", line 119, in __call__ if verbosity <= self.verbosity: self.log_to_file(message) File "/usr/lib64/python2.3/site-packages/rdiff_backup/log.py", line 128, in log_to_file self.logfp.flush() IOError: [Errno 13] Permission denied Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method GzipFile.__del__ of <gzip open file '/backup/drdsb01/drdsa01/rdiff-backup-data/file_statistics.2006-11-06T10 :39:18-05:00.data.gz', mode 'wb' at 0x2aaaade2b9d0 0x2aaaaf1b9440>> ignored Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method GzipFile.__del__ of <gzip open file '/backup/drdsb01/drdsa01/rdiff-backup-data/error_log.2006-11-06T10:39:18 -05:00.data.gz', mode 'wb' at 0x2aaaad9f5500 0x2aaaaab387a0>> ignored Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method GzipFile.__del__ of <gzip open file '/backup/drdsb01/drdsa01/rdiff-backup-data/mirror_metadata.2006-11-06T10 :39:18-05:00.snapshot.gz', mode 'wb' at 0x2aaaada06570 0x2aaaaf1b9cf8>> ignored error - exceptions.Exception: backup:rc=256 _____________________________________________ From: Sacerdoti, Federico Sent: Monday, November 06, 2006 10:52 AM To: '[email protected]' Subject: strange traceback Hi, I am using rdiff-backup sucessfully to backup very large NAS servers (~3TB each). The setup is one primary nas, one secondary nas. I use the "local" method of rdiff backup, where I mount the secondary's volume on the primary via NFS, then tell rdiff-backup to use the mountpoint as the destination. Things have been humming along as expected for several weeks, until last night. My nightly rdiff-backup script emails errors if rdiff-backup returns non-zero. All 5 of my NAS primary servers emailed this traceback: (I apologize for the loss of newlines). Note these are indepedant servers and the tracebacks are identical. It is possible we experienced a network partition between the primary and backup last night. Thanks, Federico D. E. Shaw Research LLC [clip] _______________________________________________ rdiff-backup-users mailing list at [email protected] http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
