PATCH: --write-devices to allow synchronising to a block device

2009-10-15 Thread Darryl Dixon - Winterhouse Consulting
Hi List,

I had a need recently to efficiently synchronise between some large LUNs
(boot drive disks) at two different datacentres. Solutions like drbd and
$proprietary_array_vendors_software were overkill - we only needed
(wanted!) to periodically synchronise these LUNs whenever major changes
were generated on the source. On the other hand however, re-sending the
entire disk contents each time would have been prohibitive.

So, I immediately thought about rsync. However, I discovered two problems:
1) The default build doesn't want to read from block device files
2) The default build doesn't want to write to block device files

It turned out that (1) was easy to solve as there is a --copy-devices
patch in the rsync-patches distribution that delivers this functionality
(read from block device files). Seems that nobody before me however had
needed/wanted to be able to do (2).

So I wrote a patch (--write-devices) which fulfills (2). The example usage
scenario for synchronising one disk to another like this would be: $ rsync
--copy-devices --write-devices /dev/sda /dev/sdb

I want to stress that, obviously --write-devices implies --inplace, and I
am exceptionally grateful to both the rsync developers and the authors of
the original --inplace code for making this possible. Additionally, I used
the --copy-devices patch for clues and some of the device sizing code -
thanks!

I have included the patch at the bottom of this mail. I would appreciate
any constructive critique etc to improve the robustness and quality of the
patch.

regards,
Darryl Dixon
Winterhouse Consulting Ltd
http://www.winterhouseconsulting.com

8---[snip]
diff -ru rsync-3.0.6/generator.c rsync-3.0.6-writedev/generator.c
--- rsync-3.0.6/generator.c 2009-04-27 02:51:50.0 +1200
+++ rsync-3.0.6-writedev/generator.c2009-10-15 20:54:07.0 +1300
@@ -39,6 +39,7 @@
 extern int preserve_xattrs;
 extern int preserve_links;
 extern int preserve_devices;
+extern int write_devices;
 extern int preserve_specials;
 extern int preserve_hard_links;
 extern int preserve_executability;
@@ -1733,7 +1734,7 @@
fnamecmp = fname;
fnamecmp_type = FNAMECMP_FNAME;

-   if (statret == 0  !S_ISREG(sx.st.st_mode)) {
+   if (statret == 0  !(S_ISREG(sx.st.st_mode) || (write_devices 
IS_DEVICE(sx.st.st_mode {
if (delete_item(fname, sx.st.st_mode, del_opts |
DEL_FOR_FILE) != 0)
goto cleanup;
statret = -1;
diff -ru rsync-3.0.6/options.c rsync-3.0.6-writedev/options.c
--- rsync-3.0.6/options.c   2009-04-13 08:01:14.0 +1200
+++ rsync-3.0.6-writedev/options.c  2009-10-15 20:56:18.0 +1300
@@ -48,6 +48,7 @@
 int keep_dirlinks = 0;
 int copy_dirlinks = 0;
 int copy_links = 0;
+int write_devices = 0;
 int preserve_links = 0;
 int preserve_hard_links = 0;
 int preserve_acls = 0;
@@ -350,6 +351,7 @@
   rprintf(F, -o, --owner preserve owner (super-user
only)\n);
   rprintf(F, -g, --group preserve group\n);
   rprintf(F, --devices   preserve device files
(super-user only)\n);
+  rprintf(F, -w  --write-devices write to devices as regular
files (implies --inplace)\n);
   rprintf(F, --specials  preserve special files\n);
   rprintf(F, -D  same as --devices --specials\n);
   rprintf(F, -t, --times preserve modification times\n);
@@ -508,6 +510,7 @@
   {no-D, 0,  POPT_ARG_NONE,   0, OPT_NO_D, 0, 0 },
   {devices,  0,  POPT_ARG_VAL,preserve_devices, 1, 0, 0 },
   {no-devices,   0,  POPT_ARG_VAL,preserve_devices, 0, 0, 0 },
+  {write-devices,   'w', POPT_ARG_NONE,   0, 'w', 0, 0 },
   {specials, 0,  POPT_ARG_VAL,preserve_specials, 1, 0, 0 },
   {no-specials,  0,  POPT_ARG_VAL,preserve_specials, 0, 0, 0 },
   {links,   'l', POPT_ARG_VAL,preserve_links, 1, 0, 0 },
@@ -1261,6 +1264,11 @@
return 0;
 #endif

+case 'w':
+write_devices = 1;
+inplace = 1;
+break;
+
default:
/* A large opt value means that set_refuse_options()
 * turned this option off. */
@@ -2069,6 +2077,9 @@
else if (remove_source_files)
args[ac++] = --remove-sent-files;

+if (write_devices)
+args[ac++] = --write-devices;
+
if (ac  MAX_SERVER_ARGS) { /* Not possible... */
rprintf(FERROR, argc overflow in server_options().\n);
exit_cleanup(RERR_MALLOC);
diff -ru rsync-3.0.6/receiver.c rsync-3.0.6-writedev/receiver.c
--- rsync-3.0.6/receiver.c  2009-04-13 07:48:59.0 +1200
+++ rsync-3.0.6-writedev/receiver.c 2009-10-15 20:54:22.0 +1300
@@ -38,6 +38,7 @@
 extern int relative_paths;
 extern int 

Re: Delay for --remove-source-files

2009-10-15 Thread Matt McCutchen
On Tue, 2009-10-13 at 03:51 -0700, Martin Scharrer wrote:
 I'm using rsync with -aP --remove-source-files to move files from one
 machine to another while watching the progress. I'm under the impression
 that rsync is deleting the transmitted source files on-the-fly, not at the
 very end, but with a delay of 2-3 files, i.e. if 10 files are moved the
 first source file is deleted after the third of fourth file got transmitted.
 However, if rsync is aborted (CTRL+C) all fully source transmitted files are
 deleted.
 Can anyone tell me if this delay is intended behavior and if it can be
 manipulated/configured differently?
 My source machine has only limited space (10GB), my files are rather big (.4
 - 1.2GB each) and my bandwidth around 300-500KB, so sometimes I want to get
 rid of the sent files as fast as possible without waiting for the next files
 being transmitted.

IIUC, the delay occurs due to the pipelining in rsync; since it is not a
problem for most users, no special effort was made to avoid it.  If it
is a problem for you, you might consider running rsync once per file.
It sounds like the files are big enough that doing so wouldn't be
unreasonably wasteful.

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Delay for --remove-source-files

2009-10-15 Thread Martin Scharrer



Matt McCutchen-7 wrote:
 
 On Tue, 2009-10-13 at 03:51 -0700, Martin Scharrer wrote:
 I'm using rsync with -aP --remove-source-files to move files from one
 machine to another while watching the progress. I'm under the impression
 that rsync is deleting the transmitted source files on-the-fly, not at
 the
 very end.
 [...]
 
 IIUC, the delay occurs due to the pipelining in rsync; since it is not a
 problem for most users, no special effort was made to avoid it.  If it
 is a problem for you, you might consider running rsync once per file.
 It sounds like the files are big enough that doing so wouldn't be
 unreasonably wasteful.
 
 

Thanks Matt for pointing this out.
I will consider to change my rsync script.

Martin


-- 
View this message in context: 
http://www.nabble.com/Delay-for---remove-source-files-tp25870695p25908907.html
Sent from the Samba - rsync mailing list archive at Nabble.com.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: retrieve files without local dir compare

2009-10-15 Thread Martin Scharrer

I agree with the ideas below and like to add the following:

Under Linux/Unix hard links could be used to copy the files on the
receiving side. This works around the deleting problem and is way more
efficient to actually copying and deleting the files. I.e.: rsync to folder
'A', then hardlink the folder content recursively to folder 'B' where the
other software is processing and deleting it.

About the other way: --files-from accepts a remote file path if it starts
with ':', see the rsync manual for more details. 

Best,
Martin

 

Fabian Cenedese wrote:
 
 
We receive meteorological data from a remote server to a local directory
(every 5 min).
If the data is here, it is imported by a special software, after the
import it will be deleted from that directory. The deleting can't be
disabled.
Normally I would say, ok download all again, but we get 80GB data per day.

If rsync compares the local dir it will download all again, because it's
empty.
So rsync has to know what is already downloaded, and only get the new
files WITHOUT the dir compare.
Does anybody know a way how to realize that?
 
 One way: keep a local copy which rsync can update. Therefore only the new
 or changed files will get transferred. Then from this (e.g. from the rsync
 output)
 copy the new files into a separate folder where they will be imported from
 (and
 deleted).
 
 Another way: As Paul mentioned you first need to find out the files to
 copy,
 e.g. have a remote script that gathers all new files into a textfile. Then
 you
 first get this file and then feed it into rsync with --files-from.
 
 I thought there was a way to tell rsync to only sync files from a specific
 period (as cp can do) but I couldn't find it, maybe not possible.
 
 bye  Fabi
 
 -- 
 Please use reply-all for most replies to avoid omitting the mailing list.
 To unsubscribe or change options:
 https://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
 
 

-- 
View this message in context: 
http://www.nabble.com/retrieve-files-without-local-dir-compare-tp25888739p25909069.html
Sent from the Samba - rsync mailing list archive at Nabble.com.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 6816] New: Delta-transfer algorithm does not reuse already transmitted identical blocks

2009-10-15 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=6816

   Summary: Delta-transfer algorithm does not reuse already
transmitted identical blocks
   Product: rsync
   Version: 3.0.5
  Platform: Other
OS/Version: All
Status: NEW
  Severity: enhancement
  Priority: P3
 Component: core
AssignedTo: way...@samba.org
ReportedBy: mar...@scharrer-online.de
 QAContact: rsync...@samba.org


Hi,

I observed the following behavior of rsync: If a file contains identical blocks
(e.g. all-zero, etc.) then these blocks are not re-transfered but reused by the
delta-transfer algorithm - BUT only if one of these blocks is already in the
destination file. If not or if the destination file does not exists yet, all
identical blocks are copied over and over again. In some special cases (e.g.
large sparse files which are rsync'ed --inplace, i.e. -S can't be used) it is
much better to interrupt the rsync operations after a while and restart it so
that the identical blocks are reused, not re-transfered.

A good (but kind of trivial) example whould be a big file (say 1GB) only
containing zeros (dd if=/dev/zero of=file bs=1M count=1k) which is transfered
without the -S option. If the file does not exists at the destination it is
copied as a whole like e.g. 'scp' whould do it. I my case it is copied with
about 2MB/s. But if the file already exists, even which only a very small size,
the identical blocks are reused and the transfer speed is around the
destination hard drive I/O speed (in my case 60-120MB/s, target is a tmpfs
ramdisk).
I also tested this with a file with pseudo-random, but repeating content (dd
if=/dev/urandom of=temp bs=1M count=10; cat temp temp ... temp  file). If the
first rsync process is aborted and restarted after the first repeating block
was transfered the second rsync process is only sending meta-data, because the
existing content is just replicated.

It would be great if the delta-transfer algorithm would be extended to account
for identical to-be-send data blocks, i.e. first send the first appearance of
such a block and then simply reuse it during the same rsync process. IMHO this
should not be so difficult to implement, because most needed functionality is
already there.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 6816] Delta-transfer algorithm does not reuse already transmitted identical blocks

2009-10-15 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=6816





--- Comment #1 from mar...@scharrer-online.de  2009-10-15 10:29 CST ---
This enhancement would also effectively solve bug #5801, also reported by me.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


how can this be: file has vanished in --list-only?

2009-10-15 Thread Carlos Carvalho
I'm getting file has vanished messages during a (recursive)
--list-only. I find it strange because I'd expect rsync to access each
file only once when just sending the receiver the file list. In which
circumstances can this happen? This is with 3.0.6 in linux.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: how can this be: file has vanished in --list-only?

2009-10-15 Thread Matt McCutchen
On Thu, 2009-10-15 at 17:40 -0300, Carlos Carvalho wrote:
 I'm getting file has vanished messages during a (recursive)
 --list-only. I find it strange because I'd expect rsync to access each
 file only once when just sending the receiver the file list.

There's still a small gap between the readdir that returns a file's
name and the stat on that file.  I bet that's what you're seeing.  If
you want to know for sure, you can strace rsync.

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Nice little performance improvement

2009-10-15 Thread Mike Connell
Hi,

In my situation I'm using rsync to backup a server with (currently) about 
570,000 files.
These are all little files and maybe .1% of them change or new ones are added in
any 15 minute period.

I've split the main tree up so rsync can run on sub sub directories of the main 
tree. 
It does each of these sub sub directories sequentially. I would have liked to 
run 
some of these in parallel, but that seems to increase i/o on the main server 
too much.


Today I tried the following:

For all subsub directories
a) Fork a du -s subsubdirectory on the destination subsubdirectory
b) Run rsync on the subsubdirectory
c) repeat untill done

Seems to have improved the time it takes by about 25-30%. It looks like the du 
can
run ahead of the rsync...so that while rsync is building its file list, the du 
is warming up
the file cache on the destination. Then when rsync looks to see what it needs 
to do
on the destination, it can do this more efficiently.

Looks like a keeper so far. Any other suggestions? (was thinking of a previous
suggestion of setting /proc/sys/vm/vfs_cache_pressure to a low value).

Thanks,

Mike-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Nice little performance improvement

2009-10-15 Thread Darryl Dixon - Winterhouse Consulting
 Hi,

 In my situation I'm using rsync to backup a server with (currently) about
 570,000 files.
 These are all little files and maybe .1% of them change or new ones are
 added in
 any 15 minute period.


Hi Mike,

We have three filesystems that between them have approx 22 million files,
and around 10-20,000 new or changed files every business day.

In order to expeditiously move these new files offsite, we use a modified
version of pyinotify to log all added/altered files across the entire
filesystem(s) and then every five minutes feed the list to rsync with the
--files-from option. This works very effectively and quickly.

regards,
Darryl Dixon
Winterhouse Consulting Ltd
http://www.winterhouseconsulting.com
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Nice little performance improvement

2009-10-15 Thread Matt McCutchen
On Thu, 2009-10-15 at 19:07 -0700, Mike Connell wrote:
 Today I tried the following:
  
 For all subsub directories
 a) Fork a du -s subsubdirectory on the destination
 subsubdirectory
 b) Run rsync on the subsubdirectory
 c) repeat untill done
  
 Seems to have improved the time it takes by about 25-30%. It looks
 like the du can
 run ahead of the rsync...so that while rsync is building its file
 list, the du is warming up
 the file cache on the destination. Then when rsync looks to see what
 it needs to do
 on the destination, it can do this more efficiently.

Interesting.  If you're not using incremental recursion (the default in
rsync = 3.0.0), I can see that the du would help by forcing the
destination I/O to overlap the file-list building in time.  But with
incremental recursion, the du shouldn't be necessary because rsync
actually overlaps the checking of destination files with the file-list
building on the source.

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Nice little performance improvement

2009-10-15 Thread Mike Connell

Hi,


In order to expeditiously move these new files offsite, we use a modified
version of pyinotify to log all added/altered files across the entire
filesystem(s) and then every five minutes feed the list to rsync with the
--files-from option. This works very effectively and quickly.


Interesting...

How do you tell rsync to delete files that were deleted from the source, 
or is that not part of your use case?


Thanks,

Mike
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html