Ben Escoto,
I would like to discuss two feature requests. I'd like your input on the matter.
It's quite some text, so I hope you'll bear with me :)
First, I would really like an option like --store-checksums so that rdiff-backup
calculates md5 hashes when doing a backup, and that that checksum it used for
integrity checks upon restoration. But at restoration time, the check should not
be an option, it should always be done, if the file being restored has hash
info. This to prevent user-mistake. I once severly corrupted a partition on an
external HD because of USB2 transfer errors (which I haven't been able to solve
BTW). A lot was damaged, but of every file that resided in compressed files I
was told about the corruption, beceause of the hashing usually done in
zips/gzips/etc. The rdiff-backup repository became useless, because it had no
idea the files were damaged. Such an option would of course be annoying to most
people, because it's quite slow, but most of my backups are done through cron,
so it doesn't matter for me. I think as an option, this could be very valuable.
It doesn't even have to be that slow, if it's cleverly integrated with the copy
routine.
The other thing I'd like to discuss is how rdiff-backup detects change. I noted
earlier in this list that mtime+size checking, which rdiff-backup does IIRC, is
not very reliable. Mtimes can be changed. For example, when I install a new GCC
on Gentoo, the package manager looks for hardcoded filepaths in a whole bunch of
files on the sytem and changes them to reflect the newly installed GCC. Portage
(gentoo package manager) uses the mtimes of files to determine if the file still
belongs to a package. So, when you uninstall a package and some file has a
different mtime than is stored in the meta-file, it is assumed that this is a
new file, meaning it doesn't belong to the package, and is not uninstalled. Now,
about those hardcoded filepaths. When portage changes them, the mtime of the
files are also changed. I don't know if portages then restores the mtimes back
to what they were, to avoid orphaned files, but it should. And if it doesn't, it
may in the future. Now, when you run rdiff-backup again on your system, those
files are not detected as changed and they are left alone. This is of course not
desirable behaviour.
A different way of checking for change would be checking the ctimes. But, this
of course has the problem that not all filesystems have ctimes. And, when you
restore your backup to a new disk and run rdiff-backup again, the entire system
is considered as changed. This is not very ideal.
A different approach would be using the checksums feature describe above.
Rdiff-backup could calculate a hash of every file (or perhaps only of those
files with unchanged mtimes because when the mtime has changed, it needs to
backuped anyway) and use that for change comparison. This of course has the
disadvantage of yet more slowdown, because now even if little has changed in
what your backing up, it's contents is read completely. But, perhaps this
behaviour could also reside under an option, an option besides the
--store-checksums, like --checksum-diffs (with the latter requiring the former
to be present, for example).
Summarized, --store-checksums would calculate checksum info for integrity
checks, and --checksum-diffs would use checksums for change-detections, instead
of mtime+size.
I'm very curious to find out if you find my requests valid.
Regards,
Wiebe Cazemier
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki