PATCH: --write-devices to allow synchronising to a block device
Hi List, I had a need recently to efficiently synchronise between some large LUNs (boot drive disks) at two different datacentres. Solutions like drbd and $proprietary_array_vendors_software were overkill - we only needed (wanted!) to periodically synchronise these LUNs whenever major changes were generated on the source. On the other hand however, re-sending the entire disk contents each time would have been prohibitive. So, I immediately thought about rsync. However, I discovered two problems: 1) The default build doesn't want to read from block device files 2) The default build doesn't want to write to block device files It turned out that (1) was easy to solve as there is a --copy-devices patch in the rsync-patches distribution that delivers this functionality (read from block device files). Seems that nobody before me however had needed/wanted to be able to do (2). So I wrote a patch (--write-devices) which fulfills (2). The example usage scenario for synchronising one disk to another like this would be: $ rsync --copy-devices --write-devices /dev/sda /dev/sdb I want to stress that, obviously --write-devices implies --inplace, and I am exceptionally grateful to both the rsync developers and the authors of the original --inplace code for making this possible. Additionally, I used the --copy-devices patch for clues and some of the device sizing code - thanks! I have included the patch at the bottom of this mail. I would appreciate any constructive critique etc to improve the robustness and quality of the patch. regards, Darryl Dixon Winterhouse Consulting Ltd http://www.winterhouseconsulting.com 8---[snip] diff -ru rsync-3.0.6/generator.c rsync-3.0.6-writedev/generator.c --- rsync-3.0.6/generator.c 2009-04-27 02:51:50.0 +1200 +++ rsync-3.0.6-writedev/generator.c2009-10-15 20:54:07.0 +1300 @@ -39,6 +39,7 @@ extern int preserve_xattrs; extern int preserve_links; extern int preserve_devices; +extern int write_devices; extern int preserve_specials; extern int preserve_hard_links; extern int preserve_executability; @@ -1733,7 +1734,7 @@ fnamecmp = fname; fnamecmp_type = FNAMECMP_FNAME; - if (statret == 0 !S_ISREG(sx.st.st_mode)) { + if (statret == 0 !(S_ISREG(sx.st.st_mode) || (write_devices IS_DEVICE(sx.st.st_mode { if (delete_item(fname, sx.st.st_mode, del_opts | DEL_FOR_FILE) != 0) goto cleanup; statret = -1; diff -ru rsync-3.0.6/options.c rsync-3.0.6-writedev/options.c --- rsync-3.0.6/options.c 2009-04-13 08:01:14.0 +1200 +++ rsync-3.0.6-writedev/options.c 2009-10-15 20:56:18.0 +1300 @@ -48,6 +48,7 @@ int keep_dirlinks = 0; int copy_dirlinks = 0; int copy_links = 0; +int write_devices = 0; int preserve_links = 0; int preserve_hard_links = 0; int preserve_acls = 0; @@ -350,6 +351,7 @@ rprintf(F, -o, --owner preserve owner (super-user only)\n); rprintf(F, -g, --group preserve group\n); rprintf(F, --devices preserve device files (super-user only)\n); + rprintf(F, -w --write-devices write to devices as regular files (implies --inplace)\n); rprintf(F, --specials preserve special files\n); rprintf(F, -D same as --devices --specials\n); rprintf(F, -t, --times preserve modification times\n); @@ -508,6 +510,7 @@ {no-D, 0, POPT_ARG_NONE, 0, OPT_NO_D, 0, 0 }, {devices, 0, POPT_ARG_VAL,preserve_devices, 1, 0, 0 }, {no-devices, 0, POPT_ARG_VAL,preserve_devices, 0, 0, 0 }, + {write-devices, 'w', POPT_ARG_NONE, 0, 'w', 0, 0 }, {specials, 0, POPT_ARG_VAL,preserve_specials, 1, 0, 0 }, {no-specials, 0, POPT_ARG_VAL,preserve_specials, 0, 0, 0 }, {links, 'l', POPT_ARG_VAL,preserve_links, 1, 0, 0 }, @@ -1261,6 +1264,11 @@ return 0; #endif +case 'w': +write_devices = 1; +inplace = 1; +break; + default: /* A large opt value means that set_refuse_options() * turned this option off. */ @@ -2069,6 +2077,9 @@ else if (remove_source_files) args[ac++] = --remove-sent-files; +if (write_devices) +args[ac++] = --write-devices; + if (ac MAX_SERVER_ARGS) { /* Not possible... */ rprintf(FERROR, argc overflow in server_options().\n); exit_cleanup(RERR_MALLOC); diff -ru rsync-3.0.6/receiver.c rsync-3.0.6-writedev/receiver.c --- rsync-3.0.6/receiver.c 2009-04-13 07:48:59.0 +1200 +++ rsync-3.0.6-writedev/receiver.c 2009-10-15 20:54:22.0 +1300 @@ -38,6 +38,7 @@ extern int relative_paths; extern int
Re: Delay for --remove-source-files
On Tue, 2009-10-13 at 03:51 -0700, Martin Scharrer wrote: I'm using rsync with -aP --remove-source-files to move files from one machine to another while watching the progress. I'm under the impression that rsync is deleting the transmitted source files on-the-fly, not at the very end, but with a delay of 2-3 files, i.e. if 10 files are moved the first source file is deleted after the third of fourth file got transmitted. However, if rsync is aborted (CTRL+C) all fully source transmitted files are deleted. Can anyone tell me if this delay is intended behavior and if it can be manipulated/configured differently? My source machine has only limited space (10GB), my files are rather big (.4 - 1.2GB each) and my bandwidth around 300-500KB, so sometimes I want to get rid of the sent files as fast as possible without waiting for the next files being transmitted. IIUC, the delay occurs due to the pipelining in rsync; since it is not a problem for most users, no special effort was made to avoid it. If it is a problem for you, you might consider running rsync once per file. It sounds like the files are big enough that doing so wouldn't be unreasonably wasteful. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Delay for --remove-source-files
Matt McCutchen-7 wrote: On Tue, 2009-10-13 at 03:51 -0700, Martin Scharrer wrote: I'm using rsync with -aP --remove-source-files to move files from one machine to another while watching the progress. I'm under the impression that rsync is deleting the transmitted source files on-the-fly, not at the very end. [...] IIUC, the delay occurs due to the pipelining in rsync; since it is not a problem for most users, no special effort was made to avoid it. If it is a problem for you, you might consider running rsync once per file. It sounds like the files are big enough that doing so wouldn't be unreasonably wasteful. Thanks Matt for pointing this out. I will consider to change my rsync script. Martin -- View this message in context: http://www.nabble.com/Delay-for---remove-source-files-tp25870695p25908907.html Sent from the Samba - rsync mailing list archive at Nabble.com. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: retrieve files without local dir compare
I agree with the ideas below and like to add the following: Under Linux/Unix hard links could be used to copy the files on the receiving side. This works around the deleting problem and is way more efficient to actually copying and deleting the files. I.e.: rsync to folder 'A', then hardlink the folder content recursively to folder 'B' where the other software is processing and deleting it. About the other way: --files-from accepts a remote file path if it starts with ':', see the rsync manual for more details. Best, Martin Fabian Cenedese wrote: We receive meteorological data from a remote server to a local directory (every 5 min). If the data is here, it is imported by a special software, after the import it will be deleted from that directory. The deleting can't be disabled. Normally I would say, ok download all again, but we get 80GB data per day. If rsync compares the local dir it will download all again, because it's empty. So rsync has to know what is already downloaded, and only get the new files WITHOUT the dir compare. Does anybody know a way how to realize that? One way: keep a local copy which rsync can update. Therefore only the new or changed files will get transferred. Then from this (e.g. from the rsync output) copy the new files into a separate folder where they will be imported from (and deleted). Another way: As Paul mentioned you first need to find out the files to copy, e.g. have a remote script that gathers all new files into a textfile. Then you first get this file and then feed it into rsync with --files-from. I thought there was a way to tell rsync to only sync files from a specific period (as cp can do) but I couldn't find it, maybe not possible. bye Fabi -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- View this message in context: http://www.nabble.com/retrieve-files-without-local-dir-compare-tp25888739p25909069.html Sent from the Samba - rsync mailing list archive at Nabble.com. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 6816] New: Delta-transfer algorithm does not reuse already transmitted identical blocks
https://bugzilla.samba.org/show_bug.cgi?id=6816 Summary: Delta-transfer algorithm does not reuse already transmitted identical blocks Product: rsync Version: 3.0.5 Platform: Other OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: core AssignedTo: way...@samba.org ReportedBy: mar...@scharrer-online.de QAContact: rsync...@samba.org Hi, I observed the following behavior of rsync: If a file contains identical blocks (e.g. all-zero, etc.) then these blocks are not re-transfered but reused by the delta-transfer algorithm - BUT only if one of these blocks is already in the destination file. If not or if the destination file does not exists yet, all identical blocks are copied over and over again. In some special cases (e.g. large sparse files which are rsync'ed --inplace, i.e. -S can't be used) it is much better to interrupt the rsync operations after a while and restart it so that the identical blocks are reused, not re-transfered. A good (but kind of trivial) example whould be a big file (say 1GB) only containing zeros (dd if=/dev/zero of=file bs=1M count=1k) which is transfered without the -S option. If the file does not exists at the destination it is copied as a whole like e.g. 'scp' whould do it. I my case it is copied with about 2MB/s. But if the file already exists, even which only a very small size, the identical blocks are reused and the transfer speed is around the destination hard drive I/O speed (in my case 60-120MB/s, target is a tmpfs ramdisk). I also tested this with a file with pseudo-random, but repeating content (dd if=/dev/urandom of=temp bs=1M count=10; cat temp temp ... temp file). If the first rsync process is aborted and restarted after the first repeating block was transfered the second rsync process is only sending meta-data, because the existing content is just replicated. It would be great if the delta-transfer algorithm would be extended to account for identical to-be-send data blocks, i.e. first send the first appearance of such a block and then simply reuse it during the same rsync process. IMHO this should not be so difficult to implement, because most needed functionality is already there. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
DO NOT REPLY [Bug 6816] Delta-transfer algorithm does not reuse already transmitted identical blocks
https://bugzilla.samba.org/show_bug.cgi?id=6816 --- Comment #1 from mar...@scharrer-online.de 2009-10-15 10:29 CST --- This enhancement would also effectively solve bug #5801, also reported by me. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
how can this be: file has vanished in --list-only?
I'm getting file has vanished messages during a (recursive) --list-only. I find it strange because I'd expect rsync to access each file only once when just sending the receiver the file list. In which circumstances can this happen? This is with 3.0.6 in linux. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: how can this be: file has vanished in --list-only?
On Thu, 2009-10-15 at 17:40 -0300, Carlos Carvalho wrote: I'm getting file has vanished messages during a (recursive) --list-only. I find it strange because I'd expect rsync to access each file only once when just sending the receiver the file list. There's still a small gap between the readdir that returns a file's name and the stat on that file. I bet that's what you're seeing. If you want to know for sure, you can strace rsync. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Nice little performance improvement
Hi, In my situation I'm using rsync to backup a server with (currently) about 570,000 files. These are all little files and maybe .1% of them change or new ones are added in any 15 minute period. I've split the main tree up so rsync can run on sub sub directories of the main tree. It does each of these sub sub directories sequentially. I would have liked to run some of these in parallel, but that seems to increase i/o on the main server too much. Today I tried the following: For all subsub directories a) Fork a du -s subsubdirectory on the destination subsubdirectory b) Run rsync on the subsubdirectory c) repeat untill done Seems to have improved the time it takes by about 25-30%. It looks like the du can run ahead of the rsync...so that while rsync is building its file list, the du is warming up the file cache on the destination. Then when rsync looks to see what it needs to do on the destination, it can do this more efficiently. Looks like a keeper so far. Any other suggestions? (was thinking of a previous suggestion of setting /proc/sys/vm/vfs_cache_pressure to a low value). Thanks, Mike-- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Nice little performance improvement
Hi, In my situation I'm using rsync to backup a server with (currently) about 570,000 files. These are all little files and maybe .1% of them change or new ones are added in any 15 minute period. Hi Mike, We have three filesystems that between them have approx 22 million files, and around 10-20,000 new or changed files every business day. In order to expeditiously move these new files offsite, we use a modified version of pyinotify to log all added/altered files across the entire filesystem(s) and then every five minutes feed the list to rsync with the --files-from option. This works very effectively and quickly. regards, Darryl Dixon Winterhouse Consulting Ltd http://www.winterhouseconsulting.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Nice little performance improvement
On Thu, 2009-10-15 at 19:07 -0700, Mike Connell wrote: Today I tried the following: For all subsub directories a) Fork a du -s subsubdirectory on the destination subsubdirectory b) Run rsync on the subsubdirectory c) repeat untill done Seems to have improved the time it takes by about 25-30%. It looks like the du can run ahead of the rsync...so that while rsync is building its file list, the du is warming up the file cache on the destination. Then when rsync looks to see what it needs to do on the destination, it can do this more efficiently. Interesting. If you're not using incremental recursion (the default in rsync = 3.0.0), I can see that the du would help by forcing the destination I/O to overlap the file-list building in time. But with incremental recursion, the du shouldn't be necessary because rsync actually overlaps the checking of destination files with the file-list building on the source. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Nice little performance improvement
Hi, In order to expeditiously move these new files offsite, we use a modified version of pyinotify to log all added/altered files across the entire filesystem(s) and then every five minutes feed the list to rsync with the --files-from option. This works very effectively and quickly. Interesting... How do you tell rsync to delete files that were deleted from the source, or is that not part of your use case? Thanks, Mike -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html