Re: cross-platform backup tool Same files from different source dir causes spurious diff files

Mr. Clif Mon, 07 Feb 2022 23:40:06 -0800

Hey folks,

thanks for the feedback. :-) More comments below...


On 2/7/22 8:25 PM, Robert Nichols wrote:

On 2/7/22 7:23 PM, Leland Best wrote:
Hi Cliff,

On Mon, 2022-02-07 at 11:45 -0800, Mr. Clif wrote:
Hey Eric,

any ideas on this? How do these diff files normally work?
[...]
I'm not an 'rdiff-backup' developer or anything so all you expertsout there
correct me if I'm wrong but ...
IIRC 'rdiff-backup' keeps inode info as part of the metadata for eachfile.
When you mount a filesystem Linux assigns "fake" inode numbers to avoid
collisions between filesystems on different devices/partitions/etc.. So if youchange the mount point, every file could potentially get a new inodenumber and,consequently, have changed metadata. That results in 'rdiff-backup'creating a
'*.diff*' file for every source file.
Device and inode metadata is kept only for files with multiple hardlinks. That'sto keep track of which links reference the same file. That informationis notneeded for files with just a single hard link, and unless somethinghas changed
in the latest release that metadata is not kept. You can look in the
mirror_metadata file (it's compressed ASCII) and see what fields arepresent
for each file.

Cool, these are the diff.gz files? I tried ungzipping them but the first"line" of data still seems to be binary. Is it encoded somehow?

In addition, since 'rdiff-backup' now thinks the files may havechanged itspends a lot of time checking if anything other than metadata haschanged which
_might_ account for the apparently low throughput.
That would definitely be true, and the presence of all those "zerodiff" files
show that it is, in fact, happening.

Ok lets see if I understand, inode data is usually not stored, butbecause it's on a different mount point it thinks "something" haschanged so it thoroughly checks everything. I'm wondering thoughafterwards, if it could compare the current state to the previous one,and not create the new mirror_metadata for the files that haven'tactually changed. Or... is there something I'm missing? Maybe they servea purpose, or the way it's written makes it's hard to get rid of them.Just curious. :-)

Yes Eric Lavarde, it probably is a one time effect. However... I don'tthink a readonly filesystem could be confused with file metadatachanges, and I think that the last access time of a file is probablyignored in the metadata comparison.

By the way, I was going to say I didn't have a problem with slowthroughput, that was Eric Robinson's thread. Though now that I thinkabout it, it took quite awhile to thoroughly check the whole filesystemif that's what it was doing, and of course there's the additional 500M. ;-)


    Thanks for a great project,
    Clif

Re: cross-platform backup tool Same files from different source dir causes spurious diff files

Reply via email to