Re: Reliability and robustness problems

2004-06-17 Thread John Van Essen
On Tue, 15 Jun 2004, John [EMAIL PROTECTED] wrote:
 
 It may be that there will be files from different systems that are
 identical - think system binaries, fonts etc.
 
 If these are in /var/local/backups/{host1,host2} etc, and I've run a
 script to identify these dupes and eliminate them using hard links,  can
 rsync preserve these hard links even though it can't see them all?

Yes and no.  Here are some examples.

Assume file A, B and Z are hardlinked on the source and on the target
and the source paths being synced includes files A and B, but not Z.

1. If there was no change to A and B, then there is no issue - rsync
   leaves them alone and all 3 files remain hardlinked on the target.

2. If A changes and becomes independent of an unchanged B, then the
   new A content is transferred as an individual file and B is left
   hardlinked to Z.

3. If A (and B and Z) change and remain hardlinked, the new content
   will be transferred, A and B will be hardlinked, and the hardlink
   with Z is now lost because rsync doesn't know anything about it.
   Z on the target now contains old content and becomes independent.
-- 
John Van Essen  Univ of MN Alumnus  [EMAIL PROTECTED]

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-14 Thread John
Wayne Davison wrote:
On Thu, Jun 10, 2004 at 07:21:41AM +0800, John wrote:
 

flist.c: In function `send_file_entry':
flist.c:349: `lastdir_len' undeclared (first use in this function)
   

It patched the wrong function, which is really hard to understand
because the line numbers in the patch are right for the 2.6.2 version of
flist.c.  If you read the @@ line before each hunk, you'll see the
function name it should have patched.  The first hunk makes its change
in receive_file_entry(), and the second makes its change in make_file().
Both changes are simple enough that you can patch them by hand, if
needed.
 

Wayne suggested off-list to check whether the patch is already in place.
Grumble grumble.
I've installed  2.6.2 in both sites.
We've also discovered that Telstra has improved some configuration 
item in its DSLAMs or somewhere and this leads to a lack of reliability 
of the connexion. I've now made the necessary adjustment to the DSL-300 
(don't believe the dlink website, they all talk telnet on 192.168.1.1) 
and we live in hopes the DSL link will stay up for weeks instead of hours.

I've implemented some rudimentary performance monitoring: each hour I 
run these commands:
ifconfig tun0 | mail -s Traffic report  [EMAIL PROTECTED]
ps ww u -C rsync | mail -s rsync report [EMAIL PROTECTED]

This shows me that we're not transferring enormous amounts of data (so I 
guess the hard-link problem's gone), but we're still using lots of memory:

USER   PID %CPU %MEM   VSZ  RSS TTY  STAT START   TIME COMMAND
root 15191  4.0 68.4 323900 131352 ? S01:23   6:22 rsync --recursive 
--links --hard-links --perms --owner --group --devices --times --sparse 
--one-file-system --rsh=/usr/bin/ssh --delete --delete-excluded --delete-after 
--max-delete=80 --relative --stats --numeric-ids --timeout=3600 /var/local/backups 
192.168.0.1:/var/local/backups/
It doesn't seem to be doing a lot of paging. (Note I'm not at all sure 
that my understanding of paging is the same as is meant in Linux - I've 
seen systems reporting paging where there was no swap file, and my 
understanding of paging prohibits this).

The memory usage is a concern, not because I can't reduce it for this 
run - I've not yet made the refinements suggested, or implemented 
deleting old backups, but there are other systems that need to be backed 
up too.

It may be that there will be files from different systems that are 
identical - think system binaries, fonts etc.

If these are in /var/local/backups/{host1,host2} etc, and I've run a 
script to identify these dupes and eliminate them using hard links,  can 
rsync preserve these hard links even though it can't see them all?

If not, I'll simply run the script in all locations whenever I feel the 
need. This uncertainty on my part is the reason I'm exposing the whole 
backup directory hierarchy to rsync rather than parts of it.
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-10 Thread John
Wayne Davison wrote:
On Wed, Jun 09, 2004 at 09:42:08PM +0800, John wrote:
 

I will install 2.6.2 when the backup run has completed, but I want the 
current run to complete first.
   

Since you're using multiple sources with --delete, make sure the 2.6.2
code you compiled has been patched with the simple fix attached to this
bug report:
https://bugzilla.samba.org/show_bug.cgi?id=1413
..wayn
 

I don't yet have that patch in place, so my next run won't be on the new 
version.

This one has completed, with a timeout:
+ rsync --recursive --links --hard-links --perms --owner --group 
--devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete 
--delete-excluded --delete-after --max-delete=80 --relative --stats 
--numeric-ids --timeout=3600 /var/local/backups 
192.168.0.1:/var/local/backups/
io timeout after 3600 seconds - exiting
rsync error: timeout in data send/receive (code 30) at io.c(85)

real3001m51.871s
user14m32.230s
sys 4m26.440s
As you can see, I don't have any stats on its performance.
I'm about to restart it, based on this I expect the next run to finish 
catching up.

Regarding the patch, I intend to put together a procedure that works, or 
reliably fails. I'll report back on this thread when I have an outcome.

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-09 Thread John
I got the axe out and sharpened it, and leaned it against the garden 
shed, and the plant is growning.

Canberrans here may recall the advice of David Young who had a gardening 
talkback program on ABC radio in the early 80s.

2.6.2 has fixes for unnecessary transfers.
 

I added --timeout=3600, and set it running.
I then did a little work. First, I built 2.6.2 from Sid for Woody.
Second, I located Fedora 2 where I discovered the source rpm for rsync 
2.6.2, It built as wasily as one could wish on RHL 7.3.

The backup has been chugging away for over 24 hours now. When it 
faulters, I have 2.6.2 ready to install in both locations.

Seriously, I think the significant difference has been running it on the 
VPN. I'm running OpenVPN because, when I was researching my problems 
with VTUND I discovered it doesn't cope with with firewalls: the 
recommendation is to use TCP instead of UDP, and my reading at the CIPE 
home page suggests that's not a good idea. I figured I might as well use 
PPP over SSH. However, someone reported on the VTUND that he'd got 
OpenVPN going with a minimum of bother, and that's my experience now too.

OpenVPN tunnels using UDP, and can survive outages that cause rsync/ssh 
to hang. It also does adaptive compression - it turns compression on/off 
from time to time based on current traffic.

It can also do bandwidth limiting.
I will install 2.6.2 when the backup run has completed, but I want the 
current run to complete first.

Thanks for your help. I'll review  the email with the object of filing 
some bug reports for anything I think outstanding so you folk can deal 
appropriately with them.

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-09 Thread Wayne Davison
On Wed, Jun 09, 2004 at 09:42:08PM +0800, John wrote:
 I will install 2.6.2 when the backup run has completed, but I want the 
 current run to complete first.

Since you're using multiple sources with --delete, make sure the 2.6.2
code you compiled has been patched with the simple fix attached to this
bug report:

https://bugzilla.samba.org/show_bug.cgi?id=1413

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-09 Thread John Taylor
On Wed, Jun 09, 2004 at 09:42:08PM +0800, John wrote:
 OpenVPN tunnels using UDP, and can survive outages that cause rsync/ssh 
 to hang. It also does adaptive compression - it turns compression on/off 
 from time to time based on current traffic.
 

If you are strictly using the VPN for rsync, you will be better off
using rsync's compression (-z option) that using the VPN's compression.
I realize that it is adaptive, but if you turn it completely off,
the VPN won't even have to consider if it should compress the stream
or not. OTOH, if you are rsyncing a log of compressed data, turning off
rsync's compression and using the VPN's adaptive compression might work
out better.

-John
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-09 Thread John
Wayne Davison wrote:
On Wed, Jun 09, 2004 at 09:42:08PM +0800, John wrote:
 

I will install 2.6.2 when the backup run has completed, but I want the 
current run to complete first.
   

Since you're using multiple sources with --delete, make sure the 2.6.2
code you compiled has been patched with the simple fix attached to this
bug report:
https://bugzilla.samba.org/show_bug.cgi?id=1413
 

sob Here's why I don't launch into wholesale changes to rsync
First I cut and pasted the patch. It half fitted so I reversed it.
vim flist.patch
patch flist.patch
patch -R flist.patch
I then saved it and copied it with scp. Patched again:
patch  attachment.cgi
Build:
[EMAIL PROTECTED]:~/packages/rsync-2.6.2$ dpkg-buildpackage -rfakeroot -uc
gcc -I. -I. -Wall -O2  -c flist.c -o flist.o
flist.c: In function `send_file_entry':
flist.c:349: `lastdir_len' undeclared (first use in this function)
flist.c:349: (Each undeclared identifier is reported only once
flist.c:349: for each function it appears in.)
make[1]: *** [flist.o] Error 1
make[1]: Leaving directory `/home/summer/packages/rsync-2.6.2'
At least I seem to have solved the most pressing problem, it's still 
chugging away despite the ADSL link at home going down for upwards of 30 
seconds that I've noticed.

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-09 Thread Wayne Davison
On Thu, Jun 10, 2004 at 07:21:41AM +0800, John wrote:
 flist.c: In function `send_file_entry':
 flist.c:349: `lastdir_len' undeclared (first use in this function)

It patched the wrong function, which is really hard to understand
because the line numbers in the patch are right for the 2.6.2 version of
flist.c.  If you read the @@ line before each hunk, you'll see the
function name it should have patched.  The first hunk makes its change
in receive_file_entry(), and the second makes its change in make_file().
Both changes are simple enough that you can patch them by hand, if
needed.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Reliability and robustness problems

2004-06-07 Thread John
I am trying to use rsync to backup from a site we will call office and 
another we will call home.

Both sites have DSL accounts provided by Arachnet.
At present the files being backed up don't all all to be backed up, but 
OTOH we wish to backup lots more files that aren't being backed up now.

First, we create a local backup on our office machine which happens to 
be called mail. We have this directory structure:
drwxr-xr-x   20 root 4096 May 17 23:06 20040517-1500-mon
drwxr-xr-x   20 root 4096 May 18 23:06 20040518-1500-tue
drwxr-xr-x   20 root 4096 May 19 23:09 20040519-1500-wed
drwxr-xr-x   20 root 4096 May 20 23:09 20040520-1500-thu
drwxr-xr-x   20 root 4096 May 21 23:09 20040521-1500-fri
drwxr-xr-x   20 root 4096 May 22 23:10 20040522-1500-sat
drwxr-xr-x   20 root 4096 May 23 23:09 20040523-1500-sun
drwxr-xr-x   20 root 4096 May 24 23:10 20040524-1500-mon
drwxr-xr-x   20 root 4096 May 25 23:10 20040525-1500-tue
drwxr-xr-x   20 root 4096 May 26 23:10 20040526-1500-wed
drwxr-xr-x   20 root 4096 May 27 23:10 20040527-1500-thu
drwxr-xr-x   20 root 4096 May 28 23:11 20040528-1500-fri
drwxr-xr-x   20 root 4096 May 29 23:11 20040529-1500-sat
drwxr-xr-x   20 root 4096 May 30 23:10 20040530-1500-sun
drwxr-xr-x   20 root 4096 May 31 23:11 20040531-1500-mon
drwxr-xr-x3 root 4096 Jun  1 14:10 20040601-0603-tue
drwxr-xr-x3 root 4096 Jun  1 23:07 20040601-1500-tue
drwxr-xr-x3 root 4096 Jun  2 07:42 20040601-2323-tue
drwxr-xr-x3 root 4096 Jun  2 23:07 20040602-1500-wed
drwxr-xr-x3 root 4096 Jun  3 14:04 20040603-0555-thu
drwxr-xr-x3 root 4096 Jun  3 23:06 20040603-1500-thu
drwxr-xr-x3 root 4096 Jun  4 23:07 20040604-1500-fri
drwxr-xr-x3 root 4096 Jun  5 23:08 20040605-1500-sat
drwxr-xr-x3 root 4096 Jun  7 14:19 20040607-0610-mon
drwxr-xr-x3 root 4096 Jun  8 05:01 20040607-2054-mon
drwxr-xr-x3 root 4096 Jun  8 05:35 20040607-2128-mon
drwxr-xr-x   20 root 4096 Jun  1 14:06 latest

The timestamps in the directory names are UTC times.
We maintain the contents of latest thus:
+ rsync --recursive --links --hard-links --perms --owner --group 
--devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete 
--delete-excluded --delete-after --max-delete=80 --relative --stats 
--numeric-ids --exclude-from=/etc/local/backup/system-backup.excludes 
/boot/ / /home/ /var/ /var/local/backups/office//latest

and create the backup-du-jour:
+ cp -rl /var/local/backups/office//latest 
/var/local/backups/office//20040607-2128-mon

That part works well, and the rsync part generally takes about seven 
minutes.

To copy office to home we try this:
+ rsync --recursive --links --hard-links --perms --owner --group 
--devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete 
--delete-excluded --delete-after --max-delete=80 --relative --stats 
--numeric-ids /var/local/backups 192.168.0.1:/var/local/backups/

Prior to this run that is in progress, we used home's external host 
name. I've created a VPN between the two sites (for other reasons) using 
OpenVPN: all the problems we've had so far occurred with, we'll say, the 
hostname is home.arach.net.au as that's the default way Arachnet 
assign hostnames.

I'm hoping that OpenVPN will provide a more robust recovery from network 
problems.

Problems we've had include
1. ADSL connexion at one end ot the other dropping for a while. rsync 
doesn't notice and mostly  hangs. I have seen rsync at home still 
running but with no relevant files open.

2. rsync uses an enormous amount of  virtual memory with the result the 
Linux kernel lashes out at lots of processes, mostly innocent, until it 
lucks on rsync. This can cause rsync to terminate without a useful message.
2a. Sometimes the rsync that does this is at home.
I've alleviated this at office by allocating an unreasonable amount of 
swap: unreasonable because if it gets used, performance will be truly 
dreadful.

3. rsync does not detect when its partner has vanished. I don't 
understand why this should be so: it seems to me that, at office, it 
should be able to detect by the fact {r,s}sh has terminated or by 
timeout, and at home by timeout.

3a. It'd like to see rsync have the ability to retry in the case it's 
initiated the transfer. It can take some time to collect together the 
information as to what needs to be done: if I try in its wrapper script, 
then this has to be redone whereas, I surmise, rsync doing the retry 
would not need to.

4. I've already mentioned this, but as I've had no feedback I'll try again.
As you can see from the above, the source directories for the transfer 
from office to home are chock-full of hard links. As best I can tell, 
rsync is transferring each copy fresh instead of recognising the hard 
link before the transfer and getting 

Re: Reliability and robustness problems

2004-06-07 Thread Wayne Davison
On Tue, Jun 08, 2004 at 07:37:32AM +0800, John wrote:
 1. ADSL connexion at one end ot the other dropping for a while. rsync 
 doesn't notice and mostly  hangs. I have seen rsync at home still 
 running but with no relevant files open.

There are two aspects of this: (1) Your remote shell should be setup to
timeout appropriately (which is why rsync doesn't timeout by default) --
see your remote-shell's docs for how to do this; (2) you can tell rsync
to timeout after a certain amount of inactivity (see --timeout).

 2. rsync uses an enormous amount of virtual memory

Yes, it uses something like 80-100 bytes or so per file in the
transferred hierarchy (depending on options) plus a certain base amount
of memory.  Your options are to (1) copy smaller sections of the
hierarchy at a time, (2) add more memory, or (3) help code something
better.  This is one of the big areas that I've wanted to solve by
completely replacing the current rsync protocol with something better
(as I did in my rZync testbed protocol project a while back -- it
transfers the hierarchy incrementally, so it never has more than a
handful of directories in action at any one time).  At some point I will
get back to working on an rsync-replacement project.

 3. rsync does not detect when its partner has vanished.

That seems unlikely unless the remote shell is still around.  If the
shell has terminated, the socket would return an EOF and rsync would
exit.  So, I'll assume (until shown otherwise) that this is a case of
the remote shell still hanging around.

 3a. It'd like to see rsync have the ability to retry in the case it's 
 initiated the transfer.

There has been some talk of this recently.  It doesn't seem like it
would be too hard to do, but it's not trivial either.  If someone wanted
to code something up, I'd certainly appreciate the assistance.  Or feel
free to put an enhancement request into bugzilla.  (BTW:  has anyone
heard from J.W. Schultz anytime recently?  He seems to have dropped off
the net without any explanation about 3 months ago -- I hope he's OK.)

 4. [...] As best I can tell, rsync is transferring each copy fresh
 instead of recognising the hard link before the transfer and getting
 the destination rsync to make a new hard link.

This should not be the case if you use the -H option.  (It also helps
to use 2.6.2 on both ends, as the memory-consumption was reduced
considerably from older releases.)  If you're seeing a problem with
this, you should provide full details on what command you're running,
what versions you're using, and as small a test case as you can that
shows the problem.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-07 Thread Wayne Davison
On Mon, Jun 07, 2004 at 07:40:22PM -0700, Wayne Davison wrote:
 So, I'll assume (until shown otherwise) that this is a case of
 the remote shell still hanging around.

There's one other possibility I thought of.  You mentioned that your
kernel has gone around killing processes when memory is low.  If one
rsync process is just sitting around waiting to be killed by its sibling
rsync process, but that sibling process got killed before it had a
chance to generate the all done signal, a do-nothing rsync process
could be left hanging around indefinitely.  This is pretty rare, though,
as most of the time rsync is actively interacting with the open socket
and it notices when something goes wrong.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-07 Thread John
Hmm
I subscribed to the list before I sent this. I've not seen either the 
email confirmation request or my mail to the list.

Wayne Davison wrote:
On Tue, Jun 08, 2004 at 07:37:32AM +0800, John wrote:
 

1. ADSL connexion at one end ot the other dropping for a while. rsync 
doesn't notice and mostly  hangs. I have seen rsync at home still 
running but with no relevant files open.
   

There are two aspects of this: (1) Your remote shell should be setup to
timeout appropriately (which is why rsync doesn't timeout by default) --
see your remote-shell's docs for how to do this; (2) you can tell rsync
to timeout after a certain amount of inactivity (see --timeout).
 

I'm pretty sure that ssh times out properly: certainly it disconnects 
some time after my dialup line goes down.

I'd managed to overlook the --timeout option on rsync.
I'll look more closely for ssh hanging round next time it happens, but 
I'm skeptical.

2. rsync uses an enormous amount of virtual memory
   

Yes, it uses something like 80-100 bytes or so per file in the
transferred hierarchy (depending on options) plus a certain base amount
of memory.  Your options are to (1) copy smaller sections of the
hierarchy at a time, (2) add more memory, or (3) help code something
better.  This is one of the big areas that I've wanted to solve by
completely replacing the current rsync protocol with something better
(as I did in my rZync testbed protocol project a while back -- it
transfers the hierarchy incrementally, so it never has more than a
handful of directories in action at any one time).  At some point I will
get back to working on an rsync-replacement project.
 

1. No chance, I'd think, of it handling hard links properly if it can't 
see at least one of the other copies. However, I could easily be wrong.
2. May require new boxes all round. Money may be an issue.
3. Yeah. My C skills a pretty rudimentary; I can barely read the stuff. 
Time for me to learn it was 30 or even years go, but I don't think it 
was invented then.

 

3. rsync does not detect when its partner has vanished.
   

That seems unlikely unless the remote shell is still around.  If the
shell has terminated, the socket would return an EOF and rsync would
exit.  So, I'll assume (until shown otherwise) that this is a case of
the remote shell still hanging around.
 

I've been known to have unlikely failures before:-) It could be the 
bloody Billion in the way. I know that if a TCP session is quiet too 
long the Billion forgets all about it.

The fact I'm now trying a VPN using UDP should overcome that issue.

3a. It'd like to see rsync have the ability to retry in the case it's 
initiated the transfer.
   

There has been some talk of this recently.  It doesn't seem like it
would be too hard to do, but it's not trivial either.  If someone wanted
to code something up, I'd certainly appreciate the assistance.  Or feel
free to put an enhancement request into bugzilla.  (BTW:  has anyone
heard from J.W. Schultz anytime recently?  He seems to have dropped off
the net without any explanation about 3 months ago -- I hope he's OK.)
 

4. [...] As best I can tell, rsync is transferring each copy fresh
instead of recognising the hard link before the transfer and getting
the destination rsync to make a new hard link.
   

This should not be the case if you use the -H option.  (It also helps
to use 2.6.2 on both ends, as the memory-consumption was reduced
considerably from older releases.)  If you're seeing a problem with
this, you should provide full details on what command you're running,
what versions you're using, and as small a test case as you can that
shows the problem.
 

Well, I cut and pasted the commandline (with only a minor edit to 
disguise relevant sites). Here 'tis again:
rsync --recursive --links --hard-links --perms --owner --group --devices 
--times --sparse --one-file-system --rsh=/usr/bin/ssh --delete 
--delete-excluded --delete-after --max-delete=80 --relative --stats 
--numeric-ids

At the moment I'm running the standard latest versions for Woody 
(office) and Red Hat Linux 9 (home).

Acutally, Woody may be one fix behind latest, there as a new rsync out 
in the past few days that went in this morning. Nah, looks like the 
update went in before the last started, the rync binary it's got open 
isn't deleted.
Office:
rsync  version 2.5.6cvs  protocol version 26
Home:
rsync  version 2.5.7  protocol version 26

Is there a source of pre-built binaries? I didn't see it on the Rsync site.
..wayne..
 

--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-07 Thread John
Wayne Davison wrote:
On Mon, Jun 07, 2004 at 07:40:22PM -0700, Wayne Davison wrote:
 

So, I'll assume (until shown otherwise) that this is a case of
the remote shell still hanging around.
   

There's one other possibility I thought of.  You mentioned that your
kernel has gone around killing processes when memory is low.  If one
rsync process is just sitting around waiting to be killed by its sibling
rsync process, but that sibling process got killed before it had a
chance to generate the all done signal, a do-nothing rsync process
could be left hanging around indefinitely.  This is pretty rare, though,
as most of the time rsync is actively interacting with the open socket
and it notices when something goes wrong.
..wayne..
 

I don't know
Kernels at both office and home have done this: most recently it was 
home, but by now I've destroyed the information needed to know.

On the subject of signals, when rsync dies for any signal-related 
reason, it does not produce the stats.

Most recently this occurred this morning when I very carely chose to 
kill -HUP it.

It also misreported the signal as USR1 or INT. Whichever, it could have 
reported the stats.

A stat I don't see is how much memory was used. This would be very 
helpful in estimating what our memory requirements might be, especially 
as I don't see any guidelines elsewhere.

I might also add here that the stats I see seemed targeted at hackers. I 
find them next to incomprehensible and so mostly useless.

Numbers  I do understand include megabytes transfered (accuracy to the 
last byte is meaningless on my runs), transfer speed. Some of these 
numbers are beyond easy comprehension:
Total file size: 1850665035 bytes
Total transferred file size: 3064385 bytes
Literal data: 3065439 bytes

I prefer megabytes, and punctuation.
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Reliability and robustness problems

2004-06-07 Thread John Van Essen
(I see there's already been an exchange between you and Wayne, but
I'll still send this reply that I composed to your original email.)

On Tue, 08 Jun 2004, John [EMAIL PROTECTED] wrote:

 We maintain the contents of latest thus:
 + rsync --recursive --links --hard-links --perms --owner --group
 --devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete
 --delete-excluded --delete-after --max-delete=80 --relative --stats
 --numeric-ids --exclude-from=/etc/local/backup/system-backup.excludes
 /boot/ / /home/ /var/ /var/local/backups/office//latest

Why the double slash before latest?

 and create the backup-du-jour:
 + cp -rl /var/local/backups/office//latest
 /var/local/backups/office//20040607-2128-mon
 
 That part works well, and the rsync part generally takes about seven
 minutes.
 
 To copy office to home we try this:
 + rsync --recursive --links --hard-links --perms --owner --group
 --devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete
 --delete-excluded --delete-after --max-delete=80 --relative --stats
 --numeric-ids /var/local/backups 192.168.0.1:/var/local/backups/

I can see where you will have a dreadful number of files to process
if you are also processing all the previous backups.

 Problems we've had include
 1. ADSL connexion at one end ot the other dropping for a while. rsync
 doesn't notice and mostly  hangs. I have seen rsync at home still
 running but with no relevant files open.
 
 2. rsync uses an enormous amount of  virtual memory with the result the
 Linux kernel lashes out at lots of processes, mostly innocent, until it
 lucks on rsync. This can cause rsync to terminate without a useful message.
 2a. Sometimes the rsync that does this is at home.
 I've alleviated this at office by allocating an unreasonable amount of
 swap: unreasonable because if it gets used, performance will be truly
 dreadful.

In neither this nor your previous post have you mentioned the
verison of rsync or the OSes involved.

rsync prior to 2.6.2 (skipping 2.6.1) have non-optimized hard link
processing that used twice as much memory (!) and sometimes copied
hard-linked files when there was already a match on the receiver.

If you are not using 2.6.2, install that on both ends and try it
again.

 3. rsync does not detect when its partner has vanished. I don't
 understand why this should be so: it seems to me that, at office, it
 should be able to detect by the fact {r,s}sh has terminated or by
 timeout, and at home by timeout.

There are two timeouts - a relatively short internal socket I/O
timeout and a user-controlled client-server communications timeout.
If you are not using --timeout and the link goes down at the wrong
time, rsync can sit there forever waiting for the next item from the
other end.

Use --timeout set to some number of seconds that seems long enough
to get the job done.  If it times out, then either bump it or try
to solve the cause of the timeout.

 3a. It'd like to see rsync have the ability to retry in the case it's
 initiated the transfer. It can take some time to collect together the
 information as to what needs to be done: if I try in its wrapper script,
 then this has to be redone whereas, I surmise, rsync doing the retry
 would not need to.

You need to avoid the kinds of rsync where this becomes a major factor.

 4. I've already mentioned this, but as I've had no feedback I'll try again.
 As you can see from the above, the source directories for the transfer
 from office to home are chock-full of hard links. As best I can tell,
 rsync is transferring each copy fresh instead of recognising the hard
 link before the transfer and getting the destination rsync to make a new
 hard link. It is so that it _can_ do this that I present the backup
 directory as a whole and not the individual day's backup. That, and I
 have hopes that today's unfinished work will be done tomorrow.

2.6.2 has fixes for unnecessary transfers.

 btw the latest directory contains 1.5 Gbytes of data. The system is
 still calculating that today's backup contains 1.5 Gbytes, so it seems
 the startup costs are considerable.

It's not the size of the data that hurts, it's the number of files
and directories involved.

Here's what I suggest.

Since you have wisely made a static snapshot of the content that
you wish to back up, do the office - home rsync in two steps.

First, only rsync the latest directory, using your original rsync
arguments with the source and destination as:

  /var/local/backups/latest 192.168.0.1:/var/local/backups/latest/

Unchanged content won't be disturbed.  Changed or new content will
get transferred.

When that completes successfully, then do the second rsync, but
do *not* use --delete-excluded.  The second rsync should include
latest and the new MMDD-HHMM-ddd directory, and exclude all
others.  That should be nothing but hardlinks and should go very
quickly once the filesystem scan for the two hierarchies is done.
-- 
John Van Essen  Univ of MN 

Re: Reliability and robustness problems

2004-06-07 Thread John
John Van Essen wrote:
(I see there's already been an exchange between you and Wayne, but
I'll still send this reply that I composed to your original email.)
 

I'm glad you did!

On Tue, 08 Jun 2004, John [EMAIL PROTECTED] wrote:
 

We maintain the contents of latest thus:
+ rsync --recursive --links --hard-links --perms --owner --group
--devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete
--delete-excluded --delete-after --max-delete=80 --relative --stats
--numeric-ids --exclude-from=/etc/local/backup/system-backup.excludes
/boot/ / /home/ /var/ /var/local/backups/office//latest
   

Why the double slash before latest?
 

Just an accident of the way creation and substitution of variables 
worked. It doesn't matter to Linux (or *ix in general).

 

and create the backup-du-jour:
+ cp -rl /var/local/backups/office//latest
/var/local/backups/office//20040607-2128-mon
That part works well, and the rsync part generally takes about seven
minutes.
To copy office to home we try this:
+ rsync --recursive --links --hard-links --perms --owner --group
--devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete
--delete-excluded --delete-after --max-delete=80 --relative --stats
--numeric-ids /var/local/backups 192.168.0.1:/var/local/backups/
   

I can see where you will have a dreadful number of files to process
if you are also processing all the previous backups.
 

I will, at some time implement some pruning of the backups. Presenting 
the full list ensures any previous backup that didn't complete gets 
fixed. However well rsync works, I can't rule out power failures.

 

Problems we've had include
1. ADSL connexion at one end ot the other dropping for a while. rsync
doesn't notice and mostly  hangs. I have seen rsync at home still
running but with no relevant files open.
2. rsync uses an enormous amount of  virtual memory with the result the
Linux kernel lashes out at lots of processes, mostly innocent, until it
lucks on rsync. This can cause rsync to terminate without a useful message.
2a. Sometimes the rsync that does this is at home.
I've alleviated this at office by allocating an unreasonable amount of
swap: unreasonable because if it gets used, performance will be truly
dreadful.
   

In neither this nor your previous post have you mentioned the
verison of rsync or the OSes involved.
rsync prior to 2.6.2 (skipping 2.6.1) have non-optimized hard link
processing that used twice as much memory (!) and sometimes copied
hard-linked files when there was already a match on the receiver.
If you are not using 2.6.2, install that on both ends and try it
again.
 

I have now. I will be upgrading: I've built 2.6.2 from Sarge, am mulling 
over what to do for RHL 7.3.  I ask myself, Will Woody binaries work? 
Do I need a RHL 7.3 development machine?


3. rsync does not detect when its partner has vanished. I don't
understand why this should be so: it seems to me that, at office, it
should be able to detect by the fact {r,s}sh has terminated or by
timeout, and at home by timeout.
   

There are two timeouts - a relatively short internal socket I/O
timeout and a user-controlled client-server communications timeout.
If you are not using --timeout and the link goes down at the wrong
time, rsync can sit there forever waiting for the next item from the
other end.
Use --timeout set to some number of seconds that seems long enough
to get the job done.  If it times out, then either bump it or try
to solve the cause of the timeout.
 

This is consistent with what I see.
--timeout=500 will be in the next run.
3a. It'd like to see rsync have the ability to retry in the case it's
initiated the transfer. It can take some time to collect together the
information as to what needs to be done: if I try in its wrapper script,
then this has to be redone whereas, I surmise, rsync doing the retry
would not need to.
   

You need to avoid the kinds of rsync where this becomes a major factor.
 

Well, yes. I'm using the latest in the latest stable version of Debian.
4. I've already mentioned this, but as I've had no feedback I'll try again.
As you can see from the above, the source directories for the transfer
from office to home are chock-full of hard links. As best I can tell,
rsync is transferring each copy fresh instead of recognising the hard
link before the transfer and getting the destination rsync to make a new
hard link. It is so that it _can_ do this that I present the backup
directory as a whole and not the individual day's backup. That, and I
have hopes that today's unfinished work will be done tomorrow.
   

2.6.2 has fixes for unnecessary transfers.
 

Good.
 

btw the latest directory contains 1.5 Gbytes of data. The system is
still calculating that today's backup contains 1.5 Gbytes, so it seems
the startup costs are considerable.
   

It's not the size of the data that hurts, it's the number of files
and directories involved.
Here's what I suggest.
Since you have wisely made a static snapshot of