Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-16 Thread Andrew Gideon
On Tue, 14 Jul 2015 08:59:25 +0200, Paul Slootman wrote:

 btrfs has support for this: you make a backup, then create a btrfs
 snapshot of the filesystem (or directory), then the next time you make a
 new backup with rsync, use --inplace so that just changed parts of the
 file are written to the same blocks and btrfs will take care of the
 copy-on-write part.

That's interesting.  I'd considered doing something similar with LVM 
snapshots.  I chose not to do so because of a particular failure mode: if 
the space allocated to a snapshot filled (as a result of changes to the 
live data), the snapshot would fail.  For my purposes, I'd want the new 
write to fail instead.  Destroying snapshots holding backup data didn't 
seem a reasonable choice.

How does btrfs deal with such issues?

- Andrew

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-16 Thread Simon Hobson
Andrew Gideon c182driv...@gideon.org wrote:

 btrfs has support for this: you make a backup, then create a btrfs
 snapshot of the filesystem (or directory), then the next time you make a
 new backup with rsync, use --inplace so that just changed parts of the
 file are written to the same blocks and btrfs will take care of the
 copy-on-write part.
 
 That's interesting.  I'd considered doing something similar with LVM 
 snapshots.  I chose not to do so because of a particular failure mode: if 
 the space allocated to a snapshot filled (as a result of changes to the 
 live data), the snapshot would fail.  For my purposes, I'd want the new 
 write to fail instead.  Destroying snapshots holding backup data didn't 
 seem a reasonable choice.
 
 How does btrfs deal with such issues?

I'd have expected the live write to fail. The snapshot doesn't take any space 
(well only some for filesystem data) at the point of making the snapshot.

Once the snapshot is made, then any further changes just don't change the 
snapshotted data. If you overwrite the file, then new blocks are allocated to 
it from the free pool, and the metadata updated to point to it. I believe ZFS 
works in the same way.
The only difference in fact is that without the snapshot, after the new file 
has been written, the old version is freed and the space returned to the free 
pool.


Andrew Gideon c182driv...@gideon.org wrote:

 Is there a way to save cycles by offering zfs a hint as to where a 
 previous copy of a file's blocks may be found?

I would assume (and note that it is an assumption) is that rsync will only 
write the blocks it needs to. It's checksummed the file chunk by chunk - it 
only transferred changed chunks, and I assume that if you use the in-place 
option it shouldn't need to re-write the whole file.

So say you have a file with 5 blocks, stored in blocks ABCDE on the disk. You 
snapshot the volume, and update block3 of the file - you should now have a 
snapshot file in blocks ABCDE, and a live file in blocks ABFDE, with blocks 
ABDE shared.

With the caveat that I've not really studied this, but I have read a little and 
listened to presentations. I would really hope that both filesystems work that 
way.


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-14 Thread Paul Slootman
On Mon 13 Jul 2015, Andrew Gideon wrote:
 
 On the other hand, I do confess that I am sometimes miffed at the waste 
 involved in a small change to a very large file.  Rsync is smart about 
 moving minimal data, but it still stores an entire new copy of the file.
 
 What's needed is a file system that can do what hard links do, but at the 
 file page level.  I imagine that this would work using the same Copy On 
 Write logic used in managing memory pages after a fork().

btrfs has support for this: you make a backup, then create a btrfs
snapshot of the filesystem (or directory), then the next time you make a
new backup with rsync, use --inplace so that just changed parts of the
file are written to the same blocks and btrfs will take care of the
copy-on-write part.


Paul

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-14 Thread Ken Chase
And what's performance like? I've heard lots of COW systems performance
drops through the floor when there's many snapshots.

/kc


On Tue, Jul 14, 2015 at 08:59:25AM +0200, Paul Slootman said:
  On Mon 13 Jul 2015, Andrew Gideon wrote:
   
   On the other hand, I do confess that I am sometimes miffed at the waste 
   involved in a small change to a very large file.  Rsync is smart about 
   moving minimal data, but it still stores an entire new copy of the file.
   
   What's needed is a file system that can do what hard links do, but at the 
   file page level.  I imagine that this would work using the same Copy On 
   Write logic used in managing memory pages after a fork().
  
  btrfs has support for this: you make a backup, then create a btrfs
  snapshot of the filesystem (or directory), then the next time you make a
  new backup with rsync, use --inplace so that just changed parts of the
  file are written to the same blocks and btrfs will take care of the
  copy-on-write part.
  
  
  Paul
  
  -- 
  Please use reply-all for most replies to avoid omitting the mailing list.
  To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - k...@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto 
Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front 
St. W.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-14 Thread Simon Hobson
Ken Chase rsync-list-m...@sizone.org wrote:

 And what's performance like? I've heard lots of COW systems performance
 drops through the floor when there's many snapshots.

For BTRFS I'd suspect the performance penalty to be fairly small. Snapshots can 
be done in different ways, and the way BTRFS and (I think) ZFS do it is 
actually quite elegant.

Some systems keep a current state, and separate files for the snapshots 
(effectively a list of the differences from the current version). The 
performance hit comes when you update the current state, but before writing a 
chunk, the previous current version of the chunk must be read and added to the 
snapshot(s) that include it.

I believe the way BTRFS and XFS do it is far more elegant. When you write a 
file out, you stuff the data in a number of disk blocks, and write an entry 
into the filesystem structures to say where that data is stored.
In BTRFS, when you do a snapshot, it just notes that you've done it and at 
that point very little happens.
When you then modify a file, instead of writing the data to the same blocks on 
disk, it's written to empty space, the old version is left in place, and the 
filesystem structures are updated to account for there now being two versions. 
If you only write some blocks of the file, I'd assume that only those new 
blocks would get the COW treatment.
So the only overhead is in allocating new space to the file, and keeping two 
versions of the file allocation map.
When you delete a snapshot, all it does is delete the snapshotted versions of 
the filesystem state data and mark any freed space as free.

The only downside I see of the BTRFS way of doing it is that you'll get more 
file fragmentation. But TBH, does fragmentation really make that much 
difference on most real systems these days ?


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Simon Hobson
Andrew Gideon c182driv...@gideon.org wrote:

 These both bring me to the idea of using some file system auditing 
 mechanism to drive - perhaps with an --include-from or --files-from - 
 what rsync moves.
 
 Where I get stuck is that I cannot envision how I can provide rsync with 
 a limited list of files to move that doesn't deny the benefit of --link-
 dest: a complete snapshot of the old file system via [hard] links into a 
 prior snapshot for those files that are unchanged.

The think here is that you are into backup tools rather than the general 
purpose tool that rsync is intended to be.

storebackup does some elements of what you talk about in that it keeps a 
catalogue of existing files in the backup with a hash/checksum for each. I'm 
not sure how it goes about picking changed files - I suspect it uses 
time+size as a primary filter, but on the other hand I know for a fact you 
can touch a file and that change won't appear in the destination*.
But for remote backups, the primary server can generate a changes list which is 
then copied to the remote server which then adds the new/changed files and 
hard-links the unchanged ones according to the list it's been given.
If you turn off the file splitting and compression options, the backup is a 
series of hard-linked directories which you can look into and pull files 
directly.
* But if you do alter the timestamp on a file without changing the contents, 
that will not appear in the file structure in the backup - later copies of 
the file retain the earlier timestamp. It does keep this information, and if 
you use the corresponding restore tool then you get back the correct timestamp.


In a completely different setup, I also use Retrospect. Recent versions have an 
option (Instant Scan) to allow the client to keep an audit of changes to avoid 
the scan the client/do a massive compare that's needed with this option 
turned off.


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Ken Chase
inotifywatch or equiv, there's FSM stuff (filesystem monitor) as well.

constantData had a product we used years ago - a kernel module that dumped
out a list of any changed files out some /proc or /dev/* device and they
had a whole toolset that ate the list (into some db) and played it out
as it constantly tried to keep up with replication to a target (kinda like
drdb but async). They got eaten by some large backup company and the product
was later priced at 5x what we had paid for it (in the mid $x000s/y)

This 2003-4 technolog is certainly available in some format now.

If you only copy the changes, you're likely saving a lot of time.

/kc


On Mon, Jul 13, 2015 at 01:53:43PM +, Andrew Gideon said:
  On Mon, 13 Jul 2015 02:19:23 +, Andrew Gideon wrote:
  
   Look at tools like inotifywait, auditd, or kfsmd to see what's easily
   available to you and what best fits your needs.
   
   [Though I'd also be surprised if nobody has fed audit information into
   rsync before; your need doesn't seem all that unusual given ever-growing
   disk storage.]
  
  I wanted to take this a bit further.  I've thought, on and off, about 
  this for a while and I always get stuck.
  
  I use rsync with --link-desk as a backup tool.  For various reasons, this 
  is not something I want to give up.  But, esp. for some very large file 
  systems, doing something that avoids the scan would be desirable.
  
  I should also add that I mistrust time-stamp, and even time-stamp+file-
  size, mechanism for detecting changes.  Checksums, on the other hand, are 
  prohibitively expensive for backup of large file systems.
  
  These both bring me to the idea of using some file system auditing 
  mechanism to drive - perhaps with an --include-from or --files-from - 
  what rsync moves.
  
  Where I get stuck is that I cannot envision how I can provide rsync with 
  a limited list of files to move that doesn't deny the benefit of --link-
  dest: a complete snapshot of the old file system via [hard] links into a 
  prior snapshot for those files that are unchanged.
  
  Has anyone done something of this sort?  I'd thought of preceding the 
  rsync with a cp -Rl on the destination from the old snapshot to the new 
  snapshot, but I still think that this will break in the face of hard 
  links (to a file not in the --files-from list) or a change to file 
  attributes (ie. a chmod would effect the copy of a file in the old 
  snapshot).
  
  Thanks...
  
   Andrew
  
  -- 
  Please use reply-all for most replies to avoid omitting the mailing list.
  To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/rsync
  Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - k...@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto 
Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front 
St. W.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Simon Hobson
Andrew Gideon c182driv...@gideon.org wrote:

 However, you've made be a little 
 apprehensive about storebackup.  I like the lack of a need for a restore 
 tool.  This permits all the standard UNIX tools to be applied to 
 whatever I might want to do over the backup, which is often *very* 
 convenient.

Well if you don't use the file splitting and compression options, you can still 
do that with storebackup - just be aware that some files may have different 
timestamps (but not contents) to the original. Specifically, consider this 
sequence :
- Create a file, perform a backup
- touch the file to change it's modification timestamp, perform another backup
rsync will (I think) see the new file with different timestamp and create a new 
file rather than lining to the old one.
storebackup will link the files )so taking (almost) zero extra space - but the 
second backup will show the file with the timestamp from the first file. If you 
just cp -p the file then it'll have the earlier timestamp, if you restore it 
with the storebackup tools then it'll come out with the later timestamp.

 On the other hand, I do confess that I am sometimes miffed at the waste 
 involved in a small change to a very large file.  Rsync is smart about 
 moving minimal data, but it still stores an entire new copy of the file.

I'm not sure as I've not used it, but storebackup has the option of splitting 
large files (threshold user definable). You'd need to look and see if it 
compares file parts (hard-lining unchanged parts) or the whole file (creates 
all new parts).

 What's needed is a file system that can do what hard links do, but at the 
 file page level.  I imagine that this would work using the same Copy On 
 Write logic used in managing memory pages after a fork().

Well some (all ?) enterprise grade storage boxes support de-dup - usually at 
the block level. So it does exist, at a price !


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html