Olivier, > Well, the first comment: during my work, I wanted to verify that the > theorical optimal block size sqrt(24*n/Q) given by Andrew Tridgell in his > PHd Thesis was actually the good one, and when doing the tests on randomly > generated & modified files I discovered that the size sqrt(78*n/Q) is the > actual optimal block size, I tried to understand this by reading all the > thesis, then quite a lot of documentation about rsync but I just can't > figure out why the theorical & experimental optimal block sizes so much > don't match. I _really_ don't think it's coming from my tests, there must be > somewhat else.
First off, you need to make sure you are taking into account the conditions I mentioned for that optimal size to be correct. In particular I assumed: If, for example, we assume that the two files are the same except for Q sequences of bytes, with each sequence smaller than the block size and separated by more than the block size from the next sequence In practice there is no 'correct' model for real files, so I chose a simple module that I thought would give a reasonable approximation while being easy to analyse. Also, you didn't take into account that the function I gave was for the simpler version of rsync that I introduced in chapter 3. Later in the thesis I discuss how s_s can be reduced without compromising the algorithm (see 'Smaller Signatures' in chapter 4). That changes the calculation of optimal block size quite a bit. Thanks for looking at this though. I haven't thought closely about this algorithm in a long time! Cheers, Tridge -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
