Re: Optimising the Rsync algorithm for speed by reverting to MD4 hashing

2010-08-04 Thread Mike Bombich
. > quicker) hashing algorithm. > > Regards, > > Andrew Marlow > > > > Internet > nick.mccar...@replify.com > Sent by: rsync-boun...@lists.samba.org > > 04/08/2010 09:46 > > To > rsync@lists.samba.org > cc > Subject > Optimisin

Re: Optimising the Rsync algorithm for speed by reverting to MD4 hashing

2010-08-04 Thread andrew . marlow
rew Marlow Internet nick.mccar...@replify.com Sent by: rsync-boun...@lists.samba.org 04/08/2010 09:46 To rsync@lists.samba.org cc Subject Optimising the Rsync algorithm for speed by reverting to MD4 hashing Hi, >From v3.0.0 onwards the hash function implemented by Rs

Optimising the Rsync algorithm for speed by reverting to MD4 hashing

2010-08-04 Thread Nick McCarthy
Hi, From v3.0.0 onwards the hash function implemented by Rsync was changed from MD4 to MD5 (http://rsync.samba.org/ftp/rsync/src/rsync-3.0.0-NEWS). My understanding is that MD5 is a more secure, slower version of MD4 but I am not convinced that the added security of MD5 would alone have merited

Re: rsync algorithm

2010-07-22 Thread Henri Shustak
>> Check file size makes sense, but how rsync checks times? If a file is >> copied from one side to another remote side, the time will be >> different, right? > > Sorry wrong question, the copy file should be able to reserve the mtime. If the rsync --times option is used then rsync will attempt

Re: rsync algorithm

2010-07-21 Thread hce
On Thu, Jul 22, 2010 at 2:43 PM, hce wrote: > On Wed, Jul 21, 2010 at 10:35 PM,   wrote: >> How many of the files are changing? > > Not many. > >> If the file size and times haven't change then rsync won't be comparing >> things and calculating checksums.  When rsync sees that a file has changed

Re: rsync algorithm

2010-07-21 Thread hce
On Wed, Jul 21, 2010 at 10:35 PM, wrote: > How many of the files are changing? Not many. > If the file size and times haven't change then rsync won't be comparing > things and calculating checksums.  When rsync sees that a file has changed > then it compares checksums of the chunks to reduce

rsync algorithm

2010-07-20 Thread hce
Hi, I am learning rsync mechanism. My understanding is that rsync reads a file into multiple chunks and calculates every byte for MD5 and rolling checksum for each chunk.. On the other side of file system, it compares two checksums for each chunk in the same file to decide if two files are equal o

Value of M in rsync algorithm

2010-02-12 Thread Hasanat Kazmi
Hi, I am a little curious about value of M. Its used in rsync algo for finding modulus. ( http://samba.anu.edu.au/rsync/tech_report/node3.html ) In rsync, it is kept to 2^32 What if like Adler 32, M is changed to nearest prime to 2^32, any idea what effect it will have on resulting check-sums(In t

Re: Optimizing RSync algorithm using techniques with Google used in courgette

2009-09-08 Thread Shachar Shemesh
"we", I mean Wayne, or anyone else brabe enough to pick this task up) were to implement such a functionality, I can think of quite a few things that would have a lot more to gain than executables. In particular, something that would uncompress both source and destination, and apply

Optimizing RSync algorithm using techniques with Google used in courgette

2009-09-07 Thread Hasanat Kazmi
Hi,I am student at LUMS SSE (http://cs.lums.edu.pk) and an active RSync user. Just few days ago, Google wrote about Courgette*: an algorithm which is specially written for syncing executables. By using Courgette, Google made diff size 1/10th of previous techniques used. I was wondering if this (or

RE: rsync algorithm for large files

2009-09-05 Thread eharvey
Yup, by doing --inplace, I got down from 30 mins to 24 mins... So that's slightly better than resending the whole file again. However, this doesn't really do what I was hoping to do. Perhaps it can't be done, or somebody would like to recommend some other product that is more well suited for m

Re: rsync algorithm for large files

2009-09-04 Thread Shachar Shemesh
ehar...@lyricsemiconductors.com wrote: I thought rsync, would calculate checksums of large files that have changed timestamps or filesizes, and send only the chunks which changed. Is this not correct? My goal is to come up with a reasonable (fast and efficient) way for me to daily increment

Re: rsync algorithm for large files

2009-09-04 Thread Carlos Carvalho
Matthias Schniedermeyer (m...@citd.de) wrote on 5 September 2009 00:34: >On 04.09.2009 18:00, ehar...@lyricsemiconductors.com wrote: >> >> Why does it take longer the 3rd time I run it? Shouldn?t the performance >> always be **at least** as good as the initial sync? > >Not per se. > >Firs

Re: rsync algorithm for large files

2009-09-04 Thread Matthias Schniedermeyer
On 04.09.2009 18:00, ehar...@lyricsemiconductors.com wrote: > > Why does it take longer the 3rd time I run it? Shouldn?t the performance > always be **at least** as good as the initial sync? Not per se. First you have to determine THAT the file has changed, then the file is synced if there was

rsync algorithm for large files

2009-09-04 Thread eharvey
I thought rsync, would calculate checksums of large files that have changed timestamps or filesizes, and send only the chunks which changed. Is this not correct? My goal is to come up with a reasonable (fast and efficient) way for me to daily incrementally backup my Parallels virtual machine (a d

Re: How many differs in rsync algorithm from Andrew Tridgell's thesis

2009-04-16 Thread Daniel.Li
tion is "How many differs in rsync algorithm from Andrew > Tridgell's thesis". > > Thanks in advance. Daniel -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Befor

How many differs in rsync algorithm from Andrew Tridgell's thesis

2009-04-15 Thread Daniel.Li
Dear List, I would like to take a look at rsync's algorithm. As newbie on this algorithm, I just downloaded Andrew Tridgell's original thesis in 1999. So my question is "How many differs in rsync algorithm from Andrew Tridgell's thesis". Thanks in advance. -- Dan

Re: does the incremental rsync algorithm save on storage?

2007-07-17 Thread Wayne Davison
On Tue, Jul 17, 2007 at 07:27:51PM -0400, Matt McCutchen wrote: > I can't think of an easy way to produce a chain of forward deltas. A chain of forward deltas requires an extra copy of the backup data. So, you'd need a start point, an end point, and the deltas would be generated while updating the

Re: does the incremental rsync algorithm save on storage?

2007-07-17 Thread Matt McCutchen
On 7/17/07, Noah Leaman <[EMAIL PROTECTED]> wrote: >From what I understand, the incremental rsync algorithm saves on network bandwidth, but does rsync then just merge that delta data to end up with the new version and full sized file on the destination filesystem? Correct. I h

does the incremental rsync algorithm save on storage?

2007-07-17 Thread Noah Leaman
>From what I understand, the incremental rsync algorithm saves on network >bandwidth, but does rsync then just merge that delta data to end up with the >new version and full sized file on the destination filesystem? I have these Microsoft Entourage databases files that modified ofte

Re: discussing a "reverse rsync algorithm"

2006-05-19 Thread Matt McCutchen
Note: the reverse rsync algorithm with cached checksums is essentially what zsync uses. Zsync may be found here: http://zsync.moria.org.uk/ I like the flexibility of rsync, but I also like zsync's reversed algorithm. I would love to see the two programs merged into an rsync that

Re: discussing a "reverse rsync algorithm"

2006-04-28 Thread Wayne Davison
On Thu, Apr 27, 2006 at 05:11:41PM -0400, Matt McCutchen wrote: > I believe the sender should compute and send the block hashes. This has come up before, and it may well be something to make available as an option, but I don't think it should be the default for a number of reasons: (1) This incre

Re: Problem with --partial and rsync algorithm

2006-03-12 Thread Matias Surdi
John Van Essen wrote: > On Sun, 12 Mar 2006, Matias Surdi <[EMAIL PROTECTED]> wrote: >> I'm running the following command for a remote host backup: >> >> /usr/local/bin/rsync -a --delete --delete-excluded -v --timeout=120 -z >> --no-whole-file -partial --partial-dir .rsync-partial --exclude=/sys/*

Re: Problem with --partial and rsync algorithm

2006-03-12 Thread John Van Essen
On Sun, 12 Mar 2006, Matias Surdi <[EMAIL PROTECTED]> wrote: > I'm running the following command for a remote host backup: > > /usr/local/bin/rsync -a --delete --delete-excluded -v --timeout=120 -z > --no-whole-file -partial --partial-dir .rsync-partial --exclude=/sys/* [ snip ] BTW, you are usin

Problem with --partial and rsync algorithm

2006-03-12 Thread Matias Surdi
Hi, I'm running the following command for a remote host backup: /usr/local/bin/rsync -a --delete --delete-excluded -v --timeout=120 -z --no-whole-file -partial --partial-dir .rsync-partial --exclude=/sys/* --exclude=/tmp/* --exclude=/stuff/distfiles/* --exclude=/stuff/sistema/* --exclude=/stuff2/

Re: rsync algorithm improvements

2000-12-28 Thread John Langford
>At the risk of boring other readers, I'm curious what numbers you were >getting during your test - I just tried a --stats myself on a 51MB rsync version = 2.4.4 Working with a 100MB file and doing a null sync, I see: rsync --stats -e ssh -a --block-size=64000 says: wrote 9866 bytes read 6659 b

RE: rsync algorithm improvements

2000-12-28 Thread David Bolen
John Langford [[EMAIL PROTECTED]] writes: > >(no compression at all) you'd have to transmit 6-6.6MB of data - how > >do you arrive at 20MB? > > I ran rsync --stats on two identical files of size 100MB with a 64KB > block size and extrapolated to 20GB. The files themselves are > incompressible.

Re: rsync algorithm improvements

2000-12-28 Thread John Langford
>(no compression at all) you'd have to transmit 6-6.6MB of data - how >do you arrive at 20MB? I ran rsync --stats on two identical files of size 100MB with a 64KB block size and extrapolated to 20GB. The files themselves are incompressible. >That's sort of what I was getting at.. for example,

RE: rsync algorithm improvements

2000-12-28 Thread David Bolen
John Langford [[EMAIL PROTECTED]] writes: > > For a 20GB file (assuming large 64K blocks, and with compression > > enabled), that's probably about 2MB of data being transmitted, which > > I get about 20MB by extrapoloation - and compression is not possible here. If I'm counting correctly, 20GB

RE: rsync algorithm improvements

2000-12-28 Thread John Langford
Ok, I think I figured out how to combine the algorithms. Start with a base (like 8 or 256) and a minimum block size (like 256 bytes). The improved rsync will use the old rsync algorithm as a subroutine. I'll call the old rsync algorithm orsync and the new one irsync. Instead of act

RE: rsync algorithm improvements

2000-12-28 Thread John Langford
..), so it might actually be more > time efficient to just transmit the data even on a slow link. The computational cost is O(file size) if changes are rare and O(file size * log(file size)) if they are not. The rsync algorithm will also have to run the strong checksum on O(file size) bits if th

RE: rsync algorithm improvements

2000-12-27 Thread David Bolen
John Langford [[EMAIL PROTECTED]] writes: > This is particularly salient to me because I'm pondering running > rsync on files of size 20GB with a few MB of changes. The standard > rsync algorithm will do poorly here - resulting in a predicted > minimum of around 200MB of networ

rsync algorithm improvements

2000-12-24 Thread John Langford
er of changed bytes. The rsync algorithm appears to actually be O(F+C) although the O notation is misleading here because the constants differ greatly. The rsync algorithm has several advantages. It is robust against insertions and avoids incurring high latency by using a two-round protocol.

pysync: Python implementation of rsync algorithm.

2000-12-10 Thread Donovan Baarda
G'day again, The first version of pysync turned out to be a bit buggy... release early and release often is my excuse :-) Actualy, I tripped up over an obsure bug in Python's zlib to do with decompression of sync flushes. The workaround is to decompress in smaller chunks. This makes the code mor

Re: [rproxy-devel] Python implementation of rsync algorithm, was Re: rdiff - does it exist?

2000-12-07 Thread Martin Pool
On 8 Dec 2000, Donovan Baarda <[EMAIL PROTECTED]> wrote: > G'day, G'day :_) > Well, I finaly got around to finishing this off enought to seek comment. I > have implemented the rsync algorithm in pure python. It's not fast, but it > works, and is very simple. T

Python implementation of rsync algorithm, was Re: rdiff - does it exist?

2000-12-07 Thread Donovan Baarda
gt; > I am 90% of the way through an "example implementation" of exactly this in pure > python. I should be finished within a week (it's only a couple of hours work, Well, I finaly got around to finishing this off enought to seek comment. I have implemented the rsync algorith