Re: [tahoe-dev] Tahoe performance

Brian Warner Thu, 19 Feb 2009 14:19:47 -0800

On Thu, 19 Feb 2009 13:30:55 -0600
Luke Scharf <[email protected]> wrote:


> > When rsync decides that the file might have changed, over a regular
> > network (e.g. ssh), it uses a clever differencing algorithm.

> That clears it up a bit! I don't know any of rsync's tricks that I
> couldn't discover with lsof or ls -a....

http://samba.anu.edu.au/rsync/tech_report/ has a paper on the
technique, which also spawned Andrew Tridgell's PhD thesis. The
"rolling checksum" is the snazziest bit, as well as the observation
that a fast (but no longer cryptographically-secure) hash like MD4 is
good enough for the use case, for two reasons. The first is that the
comparison scope is so small: one source file vs one destination file.
The second is that the threat model is so limited: the only party in a
position to take advantage of the hash's flaws is yourself. If you can
find two files A1 and A2 that have the same "strong" hash (as well as
the same weak checksums), and put A1 on the destination machine and A2
on the source, then 'rsync ./A2 remote:A1' would fail to properly
replace A1 with A2. But they're both your own files anyways. Heck,
MD*1* (if there ever even was such a thing) would be good enough for
this purpose.

cheers,
 -Brian
_______________________________________________
tahoe-dev mailing list
[email protected]
http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev

Re: [tahoe-dev] Tahoe performance

Reply via email to