On 9/30/07, Stephen Zemlicka <[EMAIL PROTECTED]> wrote: > OK, let's say this is the first sync and every file is being transferred. > The checksum for each of the files is cached on the local drive. Then, the > next time you sync, it checks the checksum from the cache against the file > to be copied. If it matches, it skips it. If it doesn't match, it just > transfers just the difference. It then replaces the checksum of that > transferred file to the cache. That way one could have a remote data store > and not have to run rsync on the remote system. IE, you could have a mapped > drive or FTP folder or S3 storage area that would all be rsyncable.
That's a very clever idea, but I'd like to point out two caveats: (1) You're assuming nobody else modifies the files on the mapped drive. To remove that assumption, the checksum cache for each remote file could store the mtime of the revision of the file for which the cache is valid. Then, a destination file whose checksum cache is invalid could be identified and updated with a whole-file transfer. Optionally, you could store the caches on the mapped drive instead of the client, allowing anyone to push efficiently to the drive. (2) --inplace must be used. Furthermore, you save bandwidth only when a block of the destination file matches the source file *at the same offset*. If the offsets differ, a real delta transfer can just instruct the receiver to move the data, but in your case the data has to be written over again at the new offset. Thus, your scheme will give almost as much benefit as a real delta transfer for a database-style file that is modified in place, but if a single byte is inserted or deleted at the beginning of the source file, your scheme has to rewrite the entire destination file. You could overcome this by uploading a delta instead of updating the file itself, but that complicates matters for readers, who then have to pull the file and all deltas. If the remote filesystem supports efficient copying of a range of data from one offset to another, then #2 is moot and a smart client can do both pushes and pulls efficiently using your scheme and zsync's "reverse" delta-transfer algorithm, respectively. S3 doesn't appear to support any kind of range manipulation; perhaps Amazon could be convinced to add the necessary support. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html