Re: Questions on incremental backups

2014-07-20 Thread Sam Bull
Thanks everyone for the responses. I'll start setting up my backup
strategy in 2 or 3 weeks. I'll give the diff and unionFS tips a go, and
report back on any progress.


signature.asc
Description: This is a digitally signed message part


Re: Questions on incremental backups

2014-07-18 Thread Bob Williams
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18/07/14 05:35, Russell Coker wrote:
 Daily snapshots work welk with kernel 3.14 and above (I had
 problems with 3.13 and previous). I have snapshots every 15 mins on
 some subvols.
 
 Very large numbers of snapshots can cause performance problems. I
 suggest keeping below 1000 snapshots at this time.
 
 You can use send/recv functionality for remote backups. So far I've
 used rsync, it works well and send/recv has some limitations about
 filesystem structure etc. Rsync can transfer to a ext4 or ZFS
 filesystem if you wish.
 
 Ignoring directories in send/recv is done by subvol. Even if you
 use rsync it's a good idea to have different subvols for directory
 trees with different backup requirements.
 
 Displaying backups is an issue of backup software. It is above the
 level that BTRFS development touches. While people here can
 probably offer generic advice on backup software it's not the topic
 of the list.
 
 I use date based snapshots on my backup BTRFS filesystems and I can
 easily delete snapshots in the middle of the list.
 
I also backup to an external attached drive using rsync followed by a
snapshot. I have written a small python script that does this,
followed by deleting snapshots older than 90 days.

Restoring backed up data is done using a file manager.

I'm happy to share my script.

Bob
- -- 
Bob Williams
System:  Linux 3.11.10-17-desktop
Distro:  openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.3
Uptime:  06:00am up 1 day 22:01, 4 users, load average: 0.87, 0.50, 0.24
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlPIznwACgkQ0Sr7eZJrmU5hZwCglxUmkd+oX3ktsFBQ2gD4Twth
5ucAn38QDkNJflmRZwH/G662DBGRd38J
=kN69
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Duncan
Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted:

 Daily snapshots work welk with kernel 3.14 and above (I had problems
 with 3.13 and previous). I have snapshots every 15 mins on some subvols.
 
 Very large numbers of snapshots can cause performance problems. I
 suggest keeping below 1000 snapshots at this time.

The other caveat with btrfs snapshots is how they deal with NOCOW files, 
the usual workaround recommended for large (Gig-ish-plus) internal-
rewrite-pattern files such as databases and VM images.

I'll avoid a detailed discussion here since I don't know whether it 
applies to the OP's use-case and the problem and workarounds are well 
discussed in other threads, but this is a heads-up for the OP to do a bit 
of research on the topic if he /does/ deal with gig-plus sized VM images 
or databases.  Very briefly, putting such files on their own subvolume 
and using more traditional backup techniques instead of snapshotting is 
recommended.  Another alternative is partitioning and choosing a 
filesystem other than btrfs for those files, while still considering 
btrfs for other files.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Roman Mamedov
On Fri, 18 Jul 2014 10:45:37 + (UTC)
Duncan 1i5t5.dun...@cox.net wrote:

 Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted:
 
  Daily snapshots work welk with kernel 3.14 and above (I had problems
  with 3.13 and previous). I have snapshots every 15 mins on some subvols.
  
  Very large numbers of snapshots can cause performance problems. I
  suggest keeping below 1000 snapshots at this time.
 
 The other caveat with btrfs snapshots is how they deal with NOCOW files, 
 the usual workaround recommended for large (Gig-ish-plus) internal-
 rewrite-pattern files such as databases and VM images.

And how do they deal with them? To the best of my knowledge there is no
caveat whatsoever, NOCOW and snapshots interact perfectly, exactly as it
should be (snapshotted and then changed bits get COW'ed, but only once).

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Questions on incremental backups

2014-07-18 Thread Sam Bull
Thanks for the replies, I think that's most of the questions answered.
I'll not bother backing up any VMs, as they won't contain anything worth
backing up. Can anybody answer the last couple of remaining questions?

On ven, 2014-07-18 at 14:35 +1000, Russell Coker wrote:
 Ignoring directories in send/recv is done by subvol. Even if you use
 rsync it's a good idea to have different subvols for directory trees
 with different backup requirements.

So, an inner subvol won't be backed up? If I wanted a full backup, I
would presumably get snapshots of each subvol separately, right?

 Displaying backups is an issue of backup software. It is above the
 level that BTRFS development touches. While people here can probably
 offer generic advice on backup software it's not the topic of the
 list.

As said, I don't mind developing the software. But, is the required
information easily available? Is there a way to get a diff, something
like a list of changed/added/removed files between snapshots?

If I want to create a backup view, I could start with just a file view
of the most recent snapshot, but is there a way I can quickly get a list
of additional files in the other snapshots that are not present in the
most recent one (files that have been deleted)?



And, finally, nobody has mentioned on the possibility of merging
multiple snapshots into a single snapshot. Would this be possible, to
create a snapshot that contains the most recent version of each file
present across all of the snapshots (including files which may be
present in only one of the snapshots)?


signature.asc
Description: This is a digitally signed message part


Re: Questions on incremental backups

2014-07-18 Thread Roman Mamedov
On Fri, 18 Jul 2014 05:34:22 -0700
Duncan 1i5t5.dun...@cox.net wrote:

 Effectively, admins can choose NOCOW XOR frequent-snapshotting, altho
 the fact that snapshots stop at subvolume borders can be used as a
 partial workaround, by putting NOCOW files on a dedicated partition and
 not snapshotting it, exactly as I mentioned.

You can't backup running VM images and datafiles of an active database using
traditional backup techniques such as file copy or rsync. The tail of a file
you're copying for a backup will be long-inconsistent with the overall state
or the head of the file when you started copying. Snapshots on the other hand
are atomic, and can very much be used to create a static copy of the files for
the purposes of compressing/copying away somewhere. And at worst, the
restored from backup state of such a backed up VM or DB will be equivalent
to it just having had a power-loss. Journalling FSes and databases can deal
with that with no major problems.

So just exercise moderation, snapshot e.g. once an hour or even a day, the
result will still be better than not using NOCOW, and will deliver most of the
benefits you get by snapshotting.

Another option is to snapshot-backup-delete snapshot.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Questions on incremental backups

2014-07-18 Thread Russell Coker
On Fri, 18 Jul 2014 13:56:58 Sam Bull wrote:
 On ven, 2014-07-18 at 14:35 +1000, Russell Coker wrote:
  Ignoring directories in send/recv is done by subvol. Even if you use
  rsync it's a good idea to have different subvols for directory trees
  with different backup requirements.
 
 So, an inner subvol won't be backed up? If I wanted a full backup, I
 would presumably get snapshots of each subvol separately, right?

If you use btrfs send/recv then it won't get the inner subvol.  If you use 
rsync then by default it goes through the entire directory tree unless you use 
the -x option.

  Displaying backups is an issue of backup software. It is above the
  level that BTRFS development touches. While people here can probably
  offer generic advice on backup software it's not the topic of the
  list.
 
 As said, I don't mind developing the software. But, is the required
 information easily available? Is there a way to get a diff, something
 like a list of changed/added/removed files between snapshots?

Your usual diff utility will do it.  I guess you could parse the output of 
btrfs send.

 And, finally, nobody has mentioned on the possibility of merging
 multiple snapshots into a single snapshot. Would this be possible, to
 create a snapshot that contains the most recent version of each file
 present across all of the snapshots (including files which may be
 present in only one of the snapshots)?

There is no btrfs functionality for that.  But I'm sure you could do something 
with standard Unix utilities and copying files around.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Duncan
On Fri, 18 Jul 2014 16:55:26 +0600
Roman Mamedov r...@romanrm.net wrote:

 On Fri, 18 Jul 2014 10:45:37 + (UTC)
 Duncan 1i5t5.dun...@cox.net wrote:
 
  Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as
  excerpted:
  
   Daily snapshots work welk with kernel 3.14 and above (I had
   problems with 3.13 and previous). I have snapshots every 15 mins
   on some subvols.
   
   Very large numbers of snapshots can cause performance problems. I
   suggest keeping below 1000 snapshots at this time.
  
  The other caveat with btrfs snapshots is how they deal with NOCOW
  files, the usual workaround recommended for large (Gig-ish-plus)
  internal- rewrite-pattern files such as databases and VM images.
 
 And how do they deal with them? To the best of my knowledge there
 is no caveat whatsoever, NOCOW and snapshots interact perfectly,
 exactly as it should be (snapshotted and then changed bits get
 COW'ed, but only once).

Yes, but the fact that NOCOW files must never-the-less be COWed anyway
on the first write to a block after a snapshot isn't exactly intuitive
to many admins, and even to many list regulars until relatively
recently.

For some time the recommendation for active large database files and
VM images (the ones I mentioned) was to make them NOCOW in ordered to
avoid extreme fragmentation, and people were still reporting extreme
fragmentation and the related performance issues even when the files
were properly NOCOWed at creation. Turned out the reason was that they
had scripted auto-snapshotting enabled, sometimes snapshotting the
files as often as once a minute!  With an active VM writing data more
or less randomly to its image equally often, NOCOW lost its
effectiveness as the snapshotting was forcing COW writes most of the
time anyway!

In the context of frequent snapshots, NOCOW is in practice broken
and doesn't do what the label would indicate, thus the caveat.

Effectively, admins can choose NOCOW XOR frequent-snapshotting, altho
the fact that snapshots stop at subvolume borders can be used as a
partial workaround, by putting NOCOW files on a dedicated partition and
not snapshotting it, exactly as I mentioned.

-- 
Duncan - No HTML messages please, as they are filtered as spam.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Mike Hartman
 And, finally, nobody has mentioned on the possibility of merging
 multiple snapshots into a single snapshot. Would this be possible, to
 create a snapshot that contains the most recent version of each file
 present across all of the snapshots (including files which may be
 present in only one of the snapshots)?

 There is no btrfs functionality for that.  But I'm sure you could do something
 with standard Unix utilities and copying files around.

You could probably use UnionFS or one of the alternatives to get a
merged view of a group of snapshots. You could then copy that merged
view and delete the original snapshots. I'm not sure if there's
anything special you have to do metadata-wise to turn that merged view
into something btrfs would still recognize as a snapshot though.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Imran Geriskovan
It's not about snapshots but here is an other incremental
backup recipe for optical mediums like DVDs, BlueRays:

Base Backup:
1) Create encrypted loopback devices of DVD or BlueRay sizes.
2) Create a compressed multi device Btrfs spanning these
loopback devices. (To save space, you may use single
metadata if this is not your only backup)
3) Rsync your data into this fs.
4) Unmount it and make it SEED fs (btrfstune -S 1..)
5) Burn loopback device files to DVDs, Bluerays.

Incremental Part:
a) Before your next backup, create additional encrypted
   loopback devices as needed.
b) Mount your base backup. (It will mount as read-only)
c) Add devices created at (a) to your base backup fs.
d) Rsync into your fs. (Note that incremental data
will only go into the devices at (a)
e) Unmount all.
f) Only burn devices at (a) to DVDs, Bluerays. These
   are your incremental disks.

Regards,
Imran
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Daniel Mizyrycki


On 07/18/14 06:40, Russell Coker wrote:

Displaying backups is an issue of backup software. It is above the
level that BTRFS development touches. While people here can probably
offer generic advice on backup software it's not the topic of the
list.


As said, I don't mind developing the software. But, is the required
information easily available? Is there a way to get a diff, something
like a list of changed/added/removed files between snapshots?


Your usual diff utility will do it.  I guess you could parse the output of
btrfs send.
Following this thought, one step closer in getting a text diff can be to 
use fardump. It takes a btrfs send binary stream and outputs the send 
instructions in plaintext. 
(https://kernel.googlesource.com/pub/scm/linux/kernel/git/arne/far-progs).
It certainly would be awesome if btrfs-progs could have an extra 
parameter to just generate the list of changed/added/removed files 
between snapshots as all the needed infrastructure is already in place.





And, finally, nobody has mentioned on the possibility of merging
multiple snapshots into a single snapshot. Would this be possible, to
create a snapshot that contains the most recent version of each file
present across all of the snapshots (including files which may be
present in only one of the snapshots)?


There is no btrfs functionality for that.  But I'm sure you could do something
with standard Unix utilities and copying files around.
Sure, but the management of data deduplication is left to the user 
(presumably using cp --reflink) which is not trivial.

Does anybody knows how safe it is to use duperemove or bedup?
Any recommendations on how to effectively deduplicate btrfs at this point?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Questions on incremental backups

2014-07-17 Thread Sam Bull
I've a couple of questions on incremental backups. I've read the wiki
page, and would like to confirm my understanding of some features, and
also see if other features are possible that are not mentioned. I'm
looking to replace my existing backup solution, and hoping to match the
features I currently use, and go a little beyond.

=== Daily snapshot ===

So, if I understand correctly, I can make a daily snapshot of my
filesystem with very little overhead. Then these can later be synced
efficiently to another system (only syncing the differences), so I can
backup regularly over the internet to my server, and also to an external
HDD. After syncing, I can delete the snapshots (other than the trailing
one needed for the next backup).

In this way I can keep a constant stream of daily backups even when
offline, and simply sync them next time I am online before deleting them
locally.

=== Ignore directories ===

Due to storage limitations on my server, is it possible to ignore
certain directories? For example, ignoring the folder that stores all my
games, as this could be rather large, and the contents can easily be
re-downloaded. The instructions involve subvolumes, so maybe it's
possible to ignore a subvolume when syncing?

If that is possible, then is it also possible to have a separate backup
that does include the ignored directory? For example, having the smaller
sync to the storage-limited server, but having a full sync to an
external HDD.

=== Display backups ===

Is it possible to view the contents of all backups? So, the expected
interface would be something like a tree of all files from across all
snapshots. Any files that are not present in the latest snapshot would
be greyed out to show they have been deleted. Selecting a file would
show a list of versions of the file, with one version for each snapshot
the file has been modified in.

As long as I can get access to this information, maybe some kind of diff
between snapshots, I'm willing to write the actual software to display
this interface. (I suppose even if it's not supported, I could crawl
through the filesystems and generate some kind of database, but that
sounds like a painful process.)

=== Merge snapshots down ===

Is there some way to merge snapshots down? So, I could merge the last
week of daily snapshots into a single weekly snapshot. The new snapshot
should include all files across all the snapshots (even if deleted in
some of the snapshots), and include just the latest version of each
file.

This way, I'd like to maintain daily snapshots, which can be regularly
merged down into weekly snapshots, and then into monthly snapshots, and
then finally into yearly snapshots.


And, finally, there's no problem in deleting old snapshots? I'm assuming
any data from these snapshots used by other snapshots will still be
referenced by the other snapshots, and thus be retained, so nothing will
break?


signature.asc
Description: This is a digitally signed message part


Re: Questions on incremental backups

2014-07-17 Thread Russell Coker
Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 
and previous). I have snapshots every 15 mins on some subvols.

Very large numbers of snapshots can cause performance problems. I suggest 
keeping below 1000 snapshots at this time.

You can use send/recv functionality for remote backups. So far I've used rsync, 
it works well and send/recv has some limitations about filesystem structure 
etc. Rsync can transfer to a ext4 or ZFS filesystem if you wish.

Ignoring directories in send/recv is done by subvol. Even if you use rsync it's 
a good idea to have different subvols for directory trees with different backup 
requirements.

Displaying backups is an issue of backup software. It is above the level that 
BTRFS development touches. While people here can probably offer generic advice 
on backup software it's not the topic of the list.

I use date based snapshots on my backup BTRFS filesystems and I can easily 
delete snapshots in the middle of the list.
-- 
Sent from my Samsung Galaxy Note 2 with K-9 Mail.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html