[rdiff-backup-users] Feature requests questions/discussion

Wiebe Cazemier Fri, 21 Oct 2005 09:27:53 -0700

Ben Escoto,

I would like to discuss two feature requests. I'd like your input on the matter.It's quite some text, so I hope you'll bear with me :)

First, I would really like an option like --store-checksums so that rdiff-backupcalculates md5 hashes when doing a backup, and that that checksum it used forintegrity checks upon restoration. But at restoration time, the check should notbe an option, it should always be done, if the file being restored has hashinfo. This to prevent user-mistake. I once severly corrupted a partition on anexternal HD because of USB2 transfer errors (which I haven't been able to solveBTW). A lot was damaged, but of every file that resided in compressed files Iwas told about the corruption, beceause of the hashing usually done inzips/gzips/etc. The rdiff-backup repository became useless, because it had noidea the files were damaged. Such an option would of course be annoying to mostpeople, because it's quite slow, but most of my backups are done through cron,so it doesn't matter for me. I think as an option, this could be very valuable.It doesn't even have to be that slow, if it's cleverly integrated with the copyroutine.

The other thing I'd like to discuss is how rdiff-backup detects change. I notedearlier in this list that mtime+size checking, which rdiff-backup does IIRC, isnot very reliable. Mtimes can be changed. For example, when I install a new GCCon Gentoo, the package manager looks for hardcoded filepaths in a whole bunch offiles on the sytem and changes them to reflect the newly installed GCC. Portage(gentoo package manager) uses the mtimes of files to determine if the file stillbelongs to a package. So, when you uninstall a package and some file has adifferent mtime than is stored in the meta-file, it is assumed that this is anew file, meaning it doesn't belong to the package, and is not uninstalled. Now,about those hardcoded filepaths. When portage changes them, the mtime of thefiles are also changed. I don't know if portages then restores the mtimes backto what they were, to avoid orphaned files, but it should. And if it doesn't, itmay in the future. Now, when you run rdiff-backup again on your system, thosefiles are not detected as changed and they are left alone. This is of course notdesirable behaviour.

A different way of checking for change would be checking the ctimes. But, thisof course has the problem that not all filesystems have ctimes. And, when yourestore your backup to a new disk and run rdiff-backup again, the entire systemis considered as changed. This is not very ideal.

A different approach would be using the checksums feature describe above.Rdiff-backup could calculate a hash of every file (or perhaps only of thosefiles with unchanged mtimes because when the mtime has changed, it needs tobackuped anyway) and use that for change comparison. This of course has thedisadvantage of yet more slowdown, because now even if little has changed inwhat your backing up, it's contents is read completely. But, perhaps thisbehaviour could also reside under an option, an option besides the--store-checksums, like --checksum-diffs (with the latter requiring the formerto be present, for example).

Summarized, --store-checksums would calculate checksum info for integritychecks, and --checksum-diffs would use checksums for change-detections, insteadof mtime+size.


I'm very curious to find out if you find my requests valid.

Regards,

Wiebe Cazemier


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

[rdiff-backup-users] Feature requests questions/discussion

Reply via email to