Re: Any way to predict the amount of data to be copied when re-copying a file?
I can't answer your question directly, but I can say that it is not strictly the number of bytes that are different that matters, but also how the differences are distributed in the file. Unless you explicitly set the block size, rsync uses a size that is the sqrt of the size of the file, thus bounding the worst case for the total volume of data transmitted (block summaries *plus* block data for changed blocks). If many of these sqrt-n-sized blocks are affected, then many will be transmitted. If you know more about what tends to happens with your files, you can adjust the block size. (This is all from memory from reading the rsync tech report some time ago, but I think it remains sounds. I'm sure someone will correct me if I am off base.) Best wishes -- Eliot Moss -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Any way to predict the amount of data to be copied when re-copying a file?
On Sun, 2009-11-29 at 16:07 +, Andrew Gideon wrote: > Is > there some way to run rsync in some type of "dry run" mode but where an > actual determination of what pages should be copied is performed? > > The current --dry-run doesn't go down to this level of detail as far as I > can see. It determines which files will need to be copied, but not which > pages of those files need to be copied. > > So is there something that goes that next step in detail? Try --only-write-batch. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Any way to predict the amount of data to be copied when re-copying a file?
I do backups using rsync, and - every so often - a file takes far longer than it normally does. These are large data files which typically change only a little over time. I'm guessing that these large transfers are caused by occasional changes that "break" (ie. yield poor performance) in the "copy only changed pages" algorithm. But I've no proof of this. And I view unusual things as possible warning flags of serious problems that will rise to bite me on a Friday evening. I'd prefer to address them before they become real problems (which makes for a far less stressful life {8^). So I'd like to *know* that these occasional slow transfers are just artifacts of how rsync's "copy only changed pages" algorithm works. Is there some way to run rsync in some type of "dry run" mode but where an actual determination of what pages should be copied is performed? The current --dry-run doesn't go down to this level of detail as far as I can see. It determines which files will need to be copied, but not which pages of those files need to be copied. So is there something that goes that next step in detail? Note that this doesn't even have to work across a network to meet my needs, though that would be ideal. I could always run it after the transfer is completed (which means I'll have both copies of the file on the same system). Thanks... Andrew -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html