Ok, I think I figured out how to combine the algorithms.
Start with a base (like 8 or 256) and a minimum block size (like 256
bytes). The improved rsync will use the old rsync algorithm as a
subroutine. I'll call the old rsync algorithm orsync and the new one
irsync. Instead of actually transmitting blocks across the network
connection, the orsync algorithm will just return the list of blocks it
would transmit.
irsync starts by invoking orsync with a block size of (file size)/base and
is returned the set of blocks that orsync would have tranmitted across the
network. Then irsync recurses on the blocks which do not match with a
smaller block size until the minimum block size is reached and the
remaining unmatched blocks are sent over the network. The algorithm looks
like this:
irsync(int base, int block_size, int min_block_size, file)
{
unmatched_blocks = orsync(block_size,file);
if (block_size <= min_block_size) transmit(unmatched_blocks);
else irsync(base, block_size/base, min_block_size, unmatched_blocks);
}
And it is invoked like:
irsync(8, length(file)/8, 256, file)
I'm eliding some significant details here - for example, you want the
entire invokation of irsync to be one transaction and clearly the matching
algorithm will want to work on unmatched subsets of the file rather then
the whole file. But I think all of these details can be worked out and
hopefully you understand what I'm getting at.
The irsync algorithm should get the asymptotics right and be able to
handle insertion/deletion changes. It also reduces to the orsync
algorithm if the base is made large enough.
-John