Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)B

2013-01-21 Thread Robert Bell

Paul Wayne, Kevin, Teodor and others,
Thanks for your contributions in response to my postings.

Paul: I was very imprecise if not plain wrong in my description.  :-(
Thanks for explaining what really happens.


"Rsync will not update an existing file in-place unless you use the
 --inplace option. So --whole-file is irrelevant for this.
 Rsync (without --inplace) will always create a new (temporary) file,
 using the existing data (without --whole-file) to enable the delta diff
 speedup algorithm. Once the temp file is successfully created, it's
 renamed to the original name, deleting the existing link. So any
 hardlinked data will remain untouched."



Since my posting to the rsync digest last week, I've needed to think a
lot about rsync behaviour with hard-links, and have been doing some
tests.

It's been good to have an outbreak of postings on these issues, with a
re-visiting of Bug 5644, and Wayne's postings about features in the
upcoming version 3.1.0.  (At our site, we use a patched version of rsync
which links a file from the link-dest directory rather copying from
source when a file is identical in the source and link-dest directory,
but exists and is different in the destination.)

I was not aware of the issue in the case where the unchanged_file() test
is passed, but not the unchanged_attrs() test, and the potential for
over-writing the attributes in not just the destination, but for all
hard-linked files.

This means that recycling directories, which as Teodor Milkov noted:


 "Such a behaviour (unlink changed files and then hard link to dest dir)
 would be very handy, because rotating large directory trees (e.g. 10
 milion files, 10k files changed) is sooo much more efficient than
 deleting them and then repopulating from scratch."


is an issue as Wayne noted:


 "A pre-existing hard-linked copy of the files causes rsync to
 just change the attributes on the file in-place (without breaking the
 hard-link).  This can be a minor point for some people (if historical
 permissions/ACLs/xattrs don't need to be accurate), but could be a deal
 breaker for some."


I can see the need for another rsync option here to allow users to
select the making of a fresh copy of the file in this case.  That would
restore the behaviour I implicitly assumed we had, but didn't.

I've updated the documentation for our backups, and prepared a note for
users.  I'm also thinking about ways around this issue, none of which
are particularly appealing:
 - drop the recycling of old directories (parameterised in our set-up)
 - break the linking at regular intervals (parameterised in our set-up)
 - do a dry run to identify changed files, delete those on the
   destination, and then do a non-dry run (there are timing issues here,
   but there always will be for a non-quiet filesystem).

Thanks again

Regards
Rob. Bell  e-mail: robert.b...@csiro.au
--
Dr Robert C. Bell, BSc (Hons) PhD
Technical Services Manager
Advanced Scientific Computing
CSIRO IM&T

Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
robert.b...@csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/
Addresses:
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

Please see earlier postings for the disclaimer.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)

2013-01-18 Thread Paul Slootman
On Fri 18 Jan 2013, Robert Bell wrote:
> >
> >If a file exists in the target directory when using --link-dest rsync
> >modifies the link rather than replacing it which means you don't have
> >history for files that have been replaced rather than added or deleted.
> Thanks for your astute observation about updating hard-linked
> files: you had me worried for a while.
> 
> Fortunately, we are using the --whole-file option in our production
> backups, since the target of our backups is an HSM system (SGI's DMF),
> and we don't want rsync to start comparing files (and thus triggering a
> recall).  With this option, if a file is changed between the source and
> a target which contains a hard-linked version of the file, then the
> rsync update replaces the file in the target, not overwrites it and all
> its hard-linked cousins. Whew!

Rsync will not update an existing file in-place unless you use the
--inplace option. So --whole-file is irrelevant for this.
Rsync (without --inplace) will always create a new (temporary) file,
using the existing data (without --whole-file) to enable the delta diff
speedup algorithm. Once the temp file is successfully created, it's
renamed to the original name, deleting the existing link. So any
hardlinked data will remain untouched.


Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)

2013-01-17 Thread Robert Bell

Kevin,

Thanks for your response.

Some observations are inter-lined below.

Rob.


Regards
Rob. Bell  e-mail: robert.b...@csiro.au
--
Dr Robert C. Bell, BSc (Hons) PhD
Technical Services Manager
Advanced Scientific Computing
CSIRO IM&T

Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
robert.b...@csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/
Addresses:
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

PLEASE NOTE

The information contained in this email may be confidential or
privileged. Any unauthorised use or disclosure is prohibited. If you
have received this email in error, please delete it immediately and
notify the sender by return email. Thank you. To the extent permitted
by law, CSIRO does not represent, warrant and/or guarantee that the
integrity of this communication has been maintained or that the
communication is free of errors, virus, interception or interference.

Please consider the environment before printing this email.

-- Forwarded message --
Date: Tue, 15 Jan 2013 09:25:05 -0500
From: Kevin Korb 
To: rsync@lists.samba.org
Subject: Re: rsync - using a --files-from list to cut out scanning. How to
handle deletions?

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


If you are going to do it this way please be aware of:
https://bugzilla.samba.org/show_bug.cgi?id=8712 and
https://bugzilla.samba.org/show_bug.cgi?id=5644




If a file exists in the target directory when using --link-dest rsync
modifies the link rather than replacing it which means you don't have
history for files that have been replaced rather than added or deleted.

Thanks for your astute observation about updating hard-linked
files: you had me worried for a while.

Fortunately, we are using the --whole-file option in our production
backups, since the target of our backups is an HSM system (SGI's DMF),
and we don't want rsync to start comparing files (and thus triggering a
recall).  With this option, if a file is changed between the source and
a target which contains a hard-linked version of the file, then the
rsync update replaces the file in the target, not overwrites it and all
its hard-linked cousins. 
Whew!




If you are dealing with backing up many millions of files then I
suggest looking into a more advanced filesystem that can handle this
functionality internally rather than using --link-dest.  Currently
that is limited to ZFS or BTRFS (if you are brave).

Both of these filesystems have subvolumes and subvolume snapshot
capabilities.  This means you can do something similar to an lvm2
snapshot at the directory level instead of the whole filesystem.  You
can rsync with the same target directory each run and do a snapshot of
that target between runs.  The recycling concept is not needed because
deleting an old snapshot is much faster than doing an rm -rf on a huge
tree of hard links.  This is especially true on ZFS which usually does
the job in <1 second regardless of size.  Unfortunately BTRFS usually
completes the command quickly but the space is then slowly reclaimed
by a kernel thread in the background.

We are restricted in our use of filesystems to what is chosen for
particular hosts, so smarter backups using advanced filesystems is a
long way off.



Here is something I wrote up about it a while back:
http://sanitarium.net/golug/rsync+btrfs_backups_2011.html

Thanks - good stuff.  It parallels some of the work we have done - I
should have looked up your papers earlier.

Our recycling of old backup directories gets around the performance
issue of having to delete old backups - deletes can certainly take a
long time, and we do it only for old systems progressively over a year
or so until we finally remove the last backups.

We have added Tower of Hanoi management of the backups - great for
automatically deciding which backups to keep in a rational way, and not
having to mess with dates.

Rob.




It is a little out of date now and since I wrote it for a LUG it only
covers BTRFS.  A FreeBSD 9 system with at least 8GB of RAM running ZFS
will outperform pretty much any Linux system running BTRFS (currently)
which will outperform any Linux system running ext4 and --link-dest.

- --
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)

iEYEARECAAYFAlD1ZsEACgkQVKC1jlbQAQcqBwCg7AEnzQQj9vFV9WWnpIYfQS2W
EvoAoIFjtx8/CBpejNZ6jH7QYtvL+b8U
=+