Hello,

I have followed the discussion of speeding up rsync when there are lots of files, and I have a proposal which I think would greatly speed rsync when doing routine mirroring of large filesystems.

One of the speed-limiting issues with rsync is having to send huge file lists when mirroring large file systems, even for incremental updates where only a small part of the file system might have changed. My proposal is to first send a checksum of the file list for each directory. If is found to be identical to the same checksum on the remote side then the list need not be sent for that directory! That would reduce the size of the file list greatly when there are directories containing many files which do not change from on rsync to the next.

Here's an example:

             remote                            local
dir1 dir1 - file list checksum same as on remote -> don't send file list for dir1 dir2 dir2 - file list checksum same as on remote -> don't send file list for dir2 dir3 dir3 - file list checksum different from remote -> send file list for dir3

It might even be possible to use the rsync checksum algorithm on the directory lists themselves to determine which portion of the directory lists to send, in the case of directories which nearly identical.

I would appreciate hearing from rsync developers if this feasible with the current implementation and if they think it would help.

Thanks,

Peter Salameh


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to