On 7/7/23 11:42 AM, home user wrote:
When I try to verify a back-up, I use "diff -r". The directory trees being compared contain about
870 files (mostly binary, like PNG, JPG, and so on), and take up about 707 megabytes. The trees being
compared are on the hard drive and on a USB-3 stick. When I run the "diff -r" command, it seems to
finish too quickly - it seems like less than a half of a second. I saw similar results a few weeks ago
comparing about 30 gigabyte trees on the hard drive vs. on a USB-3.1 stick; the results were practically
instantaneous. Is diff actually checking every bit (or byte), or is it using some "short cut"?
Before opening this thread, I had already spent a lot of time and effort verifying that
"diff" worked correctly on binary files. The issue was that diff seemed to compare large
directory trees of files too quickly, which led me to believe it was using a "short cut"
rather than actually comparing file contents. I believe that some short cuts should be used:
* files of different sizes should be reported as being different without
comparing contents.
* once one bit is found to differ between two files, they should reported as
different without comparing the remaining contents.
But contents should be compared even if two files have the same name, sizes,
creation/modification histories, permissions, and other meta-data values. This
was not happening.
Ron's tests showed that my suspicion was correct: an inappropriate (in my
opinion) short cut was being used. So, Roberto, short cuts are sometimes used.
Ron also provided the solution, doing as root:
sync ; echo 3 > /proc/sys/vm/drop_caches
Patrick's point that diff wasn't meant for binary files is correct, but without a
recursive option, cmp doesn't really help unless I want to write a script to do the
recursive traversal of the 2 trees, calling cmp on every file that's in both trees. I
struggle with recursion; trying it makes me curse and re-curse and curse yet more.
Patrick's suggestion to use rsync is a good one. Robert's suggestion to use the
"-c" option is also good. But wikipedia claims that checksums are not perfect,
that it is remotely possible for files with identical checksums to differ.
Years ago, when I worked on the AWIPS program at the National Weather Service,
I needed a file restored from the regular back-up done by the sys.admins..
They couldn't do it. That taught me the importance checking back-ups.
George's early June comments (in a different thread) about USB sticks taught me
the importance of back-up checks being deep, at least occasionally.
I've tagged this thread SOLVED. "rsync --dry-run -c" seems to be a good solution in many
cases, but "diff -r" is better when a truly deep check is preferred. I thank everyone
for their contributions.
_______________________________________________
users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue