https://bugzilla.samba.org/show_bug.cgi?id=12570

            Bug ID: 12570
           Summary: Problems with --checksum --existing
           Product: rsync
           Version: 3.1.1
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: way...@samba.org
          Reporter: a...@smasher.org
        QA Contact: rsync...@samba.org

Problem:

I've got an sd-card with some movies, a few of which are corrupted files.

I want to copy only the files that don't match the good files.

command:
 rsync --checksum --existing -vhriP /movies/ /media/128-SD/Movies/

The problem here is that *all* files in "/movies/" are hashed before anything
else happens. This can be verified with lsof: "lsof +D /movies".

I've got <100GB in "/media/128-SD/Movies/".

I've got >1.5TB in "/movies/", and hashing all of those files is just a huge
waste of time and system resources.

When "--existing" and "--checksum" are both used, the algorithm should first
make a list of candidate files, then start hashing. It should *not* start
hashing everything on the send-side and then figure out which files might be
needed.

Workaround for me:
 diff -r /movies/ /media/128-SD/Movies/ | grep differ | awk '{print "pv " $3" >
"$5}' | sh

nb, that workaround requires "pv" and only works with file-names that do not
contain spaces, but for me it's a quick and easy way to see progress while
files are being copied. "cp" would work fine in place of "pv".

On my system, that workaround saved my about 1-2 days of hashing, and completed
in less than an hour.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to