Hello, Another French student in the rsync mailing list. I have been working on rsync this year for a documentation project for school and I would like to give some comment about rsync block size optimization first, and then to submit a way to make rsync choose by itself the optimal blocksize when updating a large number of files.
Well, the first comment: during my work, I wanted to verify that the theorical optimal block size sqrt(24*n/Q) given by Andrew Tridgell in his PHd Thesis was actually the good one, and when doing the tests on randomly generated & modified files I discovered that the size sqrt(78*n/Q) is the actual optimal block size, I tried to understand this by reading all the thesis, then quite a lot of documentation about rsync but I just can't figure out why the theorical & experimental optimal block sizes so much don't match. I _really_ don't think it's coming from my tests, there must be somewhat else. Maybe the rsync developpers have just changed some part of the algorithm. And also, even without using data compression during the sync, rsync is always more efficient as it should be theorically, actually between 1.5 and 2 times more efficient. Nobody will complain about that but I'd be happy if someone would be nice enough to explain me this thing. Now the auto-optimization algorithm when updating many files at a time. Let's consider a set of files to be updated. We will consider only the files which have been changed since the last update (e.g. we can find the other ones by sending a MD5 sum for each file and trying to match it). We sync the first file, but the client keeps the old local version and can find how many differences between the two files there is and then guess the optimal block size. We assume that the percentage of differences between the files is a bit the same in the same set of files. So we use for the second file the optimal size found for the first file. Then for the third file we use the (arithmetic or geometric?) average of the first two files and so on... Once we have synced a certain number of files (10? 100?) we always use the same size which is supposed to be the best one. Sorry I'm too long, hope you'll understand everything, Olivier P.S. I am not a programmer of any kind so don't wait for me to write any line of C (I know I'm a bad boy). _______ Olivier Lachambre 2, rue Roger Courtois 25 200 MONTBELIARD FRANCE e-mail : [EMAIL PROTECTED] -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html