On Fri, Jan 12, 2001 at 07:14:37PM +0100, Richard Atterer wrote:
> On Fri, Jan 12, 2001 at 04:03:24PM +1100, Martin Pool wrote:
> [Rsync 3.0]
> > It will be a more standard request-response protocol, with some kind
> > of stream multiplexing inside a single tcp or ssh connection. 
> > Requests might be `LIST n FILES', or `DELTA file basis'. It's still
> > being discussed -- unfortunately largely face-to-face in Canberra. 
> > rusty has some pre-release code for the framework, and I have part
> > of the encoding library in libhsync.
> [snip]
> > Your suggestions are welcome.
> 
> Well, this is probably not too exciting, but I'd like to mention it
> nevertheless: The quality of the rolling checksum can be increased a
> bit by not just accumulating the byte values, and neither with a
> non-zero CHAR_OFFSET (cf. current rsync sources). Instead, use a
> 256-word lookup table and accumulate the contents of the word at index
> x instead of x itself. E.g., adding one byte to the end of the
> checksum then becomes:
> 
>   uint32 a = sum;
>   uint32 b = sum >> 16;
>   a += charTable[x];
>   b += a;
>   sum = ((a & 0xffff) + (b << 16)) & 0xffffffff;
> 
> The values in the table should be purely random.
> 
> This adds an overhead of one table lookup per byte. Since the table
> ends up in the processor's cache at some point, this should not
> constitute much of a performance hit.

This is exactly what xdelta uses...

Personally, I fail to see how much benefit it provides, though no-doubt
others who have worked on this can demonstrate the benefits. The original
adler32 uses a "mod prime" that I also see as limited benefit.

Actually, in my preliminary fiddlings with adler32 in pysync, I was amazed
how effective it was. What I originaly thought was number of adler32
collisions in a large datastream turned out to be genuinely identical blocks
according to the md5sums.

-- 
----------------------------------------------------------------------
ABO: finger [EMAIL PROTECTED] for more info, including pgp key
----------------------------------------------------------------------

Reply via email to