The Verification Treadmill

Dominic Raferd Thu, 15 Feb 2024 07:48:03 -0800

I wondered if those who know the rdiff-backup code from the inside canconfirm or correct my understanding about verification of rdiff-backuprepositories, which is as follows:

'rdiff-backup verify' verifies the integrity of all files/directories(etc) in a single backup session at the specified datetime. Implicitlyit verifies the integrity of later versions of any files that existed atthat datetime. (This is because rdiff-backup uses reverse diffs and whenrecovering from, or verifying a file at, the given datetime it mustbuild it by taking the latest version that it holds and then applyingreverse diffs sequentially to regress the file to the form it had atthat datetime.)

But this verification at a given datetime does not verify any file thatwas created later than that datetime. Such a file, if subsequentlydeleted from the original source, could prove irrecoverable from therepository even though the earlier session had been verified - if therehad been corruption affecting this file (but not the files that existedat the earlier datetime) in more recent session(s).

So the only way to be confident about *all* the data in a repository isto use 'rdiff-backup verify' to verify each and every backup session ineach repository; and this includes verifying the current 'mirror'session (even though it is held in the clear in the repository). Thisneeds to be done with reasonable frequency to ensure that backed-up datahas not deteriorated (e.g. through media bitrot).

All of which takes a lot of computing power and time, much of which isduplication of effort (because, as stated above, the verification of theearliest session in a repository confirms the integrity of all laterversions of files that it contains, but it is not possible to excludethese files from re-verification for more recent sessions).

The Verification Treadmill

Reply via email to