Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)B
Paul Wayne, Kevin, Teodor and others, Thanks for your contributions in response to my postings. Paul: I was very imprecise if not plain wrong in my description. :-( Thanks for explaining what really happens. "Rsync will not update an existing file in-place unless you use the --inplace option. So --whole-file is irrelevant for this. Rsync (without --inplace) will always create a new (temporary) file, using the existing data (without --whole-file) to enable the delta diff speedup algorithm. Once the temp file is successfully created, it's renamed to the original name, deleting the existing link. So any hardlinked data will remain untouched." Since my posting to the rsync digest last week, I've needed to think a lot about rsync behaviour with hard-links, and have been doing some tests. It's been good to have an outbreak of postings on these issues, with a re-visiting of Bug 5644, and Wayne's postings about features in the upcoming version 3.1.0. (At our site, we use a patched version of rsync which links a file from the link-dest directory rather copying from source when a file is identical in the source and link-dest directory, but exists and is different in the destination.) I was not aware of the issue in the case where the unchanged_file() test is passed, but not the unchanged_attrs() test, and the potential for over-writing the attributes in not just the destination, but for all hard-linked files. This means that recycling directories, which as Teodor Milkov noted: "Such a behaviour (unlink changed files and then hard link to dest dir) would be very handy, because rotating large directory trees (e.g. 10 milion files, 10k files changed) is sooo much more efficient than deleting them and then repopulating from scratch." is an issue as Wayne noted: "A pre-existing hard-linked copy of the files causes rsync to just change the attributes on the file in-place (without breaking the hard-link). This can be a minor point for some people (if historical permissions/ACLs/xattrs don't need to be accurate), but could be a deal breaker for some." I can see the need for another rsync option here to allow users to select the making of a fresh copy of the file in this case. That would restore the behaviour I implicitly assumed we had, but didn't. I've updated the documentation for our backups, and prepared a note for users. I'm also thinking about ways around this issue, none of which are particularly appealing: - drop the recycling of old directories (parameterised in our set-up) - break the linking at regular intervals (parameterised in our set-up) - do a dry run to identify changed files, delete those on the destination, and then do a non-dry run (there are timing issues here, but there always will be for a non-quiet filesystem). Thanks again Regards Rob. Bell e-mail: robert.b...@csiro.au -- Dr Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced Scientific Computing CSIRO IM&T Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810 robert.b...@csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia Please see earlier postings for the disclaimer. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)
On Fri 18 Jan 2013, Robert Bell wrote: > > > >If a file exists in the target directory when using --link-dest rsync > >modifies the link rather than replacing it which means you don't have > >history for files that have been replaced rather than added or deleted. > Thanks for your astute observation about updating hard-linked > files: you had me worried for a while. > > Fortunately, we are using the --whole-file option in our production > backups, since the target of our backups is an HSM system (SGI's DMF), > and we don't want rsync to start comparing files (and thus triggering a > recall). With this option, if a file is changed between the source and > a target which contains a hard-linked version of the file, then the > rsync update replaces the file in the target, not overwrites it and all > its hard-linked cousins. Whew! Rsync will not update an existing file in-place unless you use the --inplace option. So --whole-file is irrelevant for this. Rsync (without --inplace) will always create a new (temporary) file, using the existing data (without --whole-file) to enable the delta diff speedup algorithm. Once the temp file is successfully created, it's renamed to the original name, deleting the existing link. So any hardlinked data will remain untouched. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)
Kevin, Thanks for your response. Some observations are inter-lined below. Rob. Regards Rob. Bell e-mail: robert.b...@csiro.au -- Dr Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced Scientific Computing CSIRO IM&T Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810 robert.b...@csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. -- Forwarded message -- Date: Tue, 15 Jan 2013 09:25:05 -0500 From: Kevin Korb To: rsync@lists.samba.org Subject: Re: rsync - using a --files-from list to cut out scanning. How to handle deletions? -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 If you are going to do it this way please be aware of: https://bugzilla.samba.org/show_bug.cgi?id=8712 and https://bugzilla.samba.org/show_bug.cgi?id=5644 If a file exists in the target directory when using --link-dest rsync modifies the link rather than replacing it which means you don't have history for files that have been replaced rather than added or deleted. Thanks for your astute observation about updating hard-linked files: you had me worried for a while. Fortunately, we are using the --whole-file option in our production backups, since the target of our backups is an HSM system (SGI's DMF), and we don't want rsync to start comparing files (and thus triggering a recall). With this option, if a file is changed between the source and a target which contains a hard-linked version of the file, then the rsync update replaces the file in the target, not overwrites it and all its hard-linked cousins. Whew! If you are dealing with backing up many millions of files then I suggest looking into a more advanced filesystem that can handle this functionality internally rather than using --link-dest. Currently that is limited to ZFS or BTRFS (if you are brave). Both of these filesystems have subvolumes and subvolume snapshot capabilities. This means you can do something similar to an lvm2 snapshot at the directory level instead of the whole filesystem. You can rsync with the same target directory each run and do a snapshot of that target between runs. The recycling concept is not needed because deleting an old snapshot is much faster than doing an rm -rf on a huge tree of hard links. This is especially true on ZFS which usually does the job in <1 second regardless of size. Unfortunately BTRFS usually completes the command quickly but the space is then slowly reclaimed by a kernel thread in the background. We are restricted in our use of filesystems to what is chosen for particular hosts, so smarter backups using advanced filesystems is a long way off. Here is something I wrote up about it a while back: http://sanitarium.net/golug/rsync+btrfs_backups_2011.html Thanks - good stuff. It parallels some of the work we have done - I should have looked up your papers earlier. Our recycling of old backup directories gets around the performance issue of having to delete old backups - deletes can certainly take a long time, and we do it only for old systems progressively over a year or so until we finally remove the last backups. We have added Tower of Hanoi management of the backups - great for automatically deciding which backups to keep in a rational way, and not having to mess with dates. Rob. It is a little out of date now and since I wrote it for a LUG it only covers BTRFS. A FreeBSD 9 system with at least 8GB of RAM running ZFS will outperform pretty much any Linux system running BTRFS (currently) which will outperform any Linux system running ext4 and --link-dest. - -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone:(407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Floridak...@sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlD1ZsEACgkQVKC1jlbQAQcqBwCg7AEnzQQj9vFV9WWnpIYfQS2W EvoAoIFjtx8/CBpejNZ6jH7QYtvL+b8U =+