Re: Questions on incremental backups
Thanks everyone for the responses. I'll start setting up my backup strategy in 2 or 3 weeks. I'll give the diff and unionFS tips a go, and report back on any progress. signature.asc Description: This is a digitally signed message part
Re: Questions on incremental backups
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18/07/14 05:35, Russell Coker wrote: Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 and previous). I have snapshots every 15 mins on some subvols. Very large numbers of snapshots can cause performance problems. I suggest keeping below 1000 snapshots at this time. You can use send/recv functionality for remote backups. So far I've used rsync, it works well and send/recv has some limitations about filesystem structure etc. Rsync can transfer to a ext4 or ZFS filesystem if you wish. Ignoring directories in send/recv is done by subvol. Even if you use rsync it's a good idea to have different subvols for directory trees with different backup requirements. Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list. I use date based snapshots on my backup BTRFS filesystems and I can easily delete snapshots in the middle of the list. I also backup to an external attached drive using rsync followed by a snapshot. I have written a small python script that does this, followed by deleting snapshots older than 90 days. Restoring backed up data is done using a file manager. I'm happy to share my script. Bob - -- Bob Williams System: Linux 3.11.10-17-desktop Distro: openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.3 Uptime: 06:00am up 1 day 22:01, 4 users, load average: 0.87, 0.50, 0.24 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlPIznwACgkQ0Sr7eZJrmU5hZwCglxUmkd+oX3ktsFBQ2gD4Twth 5ucAn38QDkNJflmRZwH/G662DBGRd38J =kN69 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted: Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 and previous). I have snapshots every 15 mins on some subvols. Very large numbers of snapshots can cause performance problems. I suggest keeping below 1000 snapshots at this time. The other caveat with btrfs snapshots is how they deal with NOCOW files, the usual workaround recommended for large (Gig-ish-plus) internal- rewrite-pattern files such as databases and VM images. I'll avoid a detailed discussion here since I don't know whether it applies to the OP's use-case and the problem and workarounds are well discussed in other threads, but this is a heads-up for the OP to do a bit of research on the topic if he /does/ deal with gig-plus sized VM images or databases. Very briefly, putting such files on their own subvolume and using more traditional backup techniques instead of snapshotting is recommended. Another alternative is partitioning and choosing a filesystem other than btrfs for those files, while still considering btrfs for other files. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
On Fri, 18 Jul 2014 10:45:37 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted: Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 and previous). I have snapshots every 15 mins on some subvols. Very large numbers of snapshots can cause performance problems. I suggest keeping below 1000 snapshots at this time. The other caveat with btrfs snapshots is how they deal with NOCOW files, the usual workaround recommended for large (Gig-ish-plus) internal- rewrite-pattern files such as databases and VM images. And how do they deal with them? To the best of my knowledge there is no caveat whatsoever, NOCOW and snapshots interact perfectly, exactly as it should be (snapshotted and then changed bits get COW'ed, but only once). -- With respect, Roman signature.asc Description: PGP signature
Re: Questions on incremental backups
Thanks for the replies, I think that's most of the questions answered. I'll not bother backing up any VMs, as they won't contain anything worth backing up. Can anybody answer the last couple of remaining questions? On ven, 2014-07-18 at 14:35 +1000, Russell Coker wrote: Ignoring directories in send/recv is done by subvol. Even if you use rsync it's a good idea to have different subvols for directory trees with different backup requirements. So, an inner subvol won't be backed up? If I wanted a full backup, I would presumably get snapshots of each subvol separately, right? Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list. As said, I don't mind developing the software. But, is the required information easily available? Is there a way to get a diff, something like a list of changed/added/removed files between snapshots? If I want to create a backup view, I could start with just a file view of the most recent snapshot, but is there a way I can quickly get a list of additional files in the other snapshots that are not present in the most recent one (files that have been deleted)? And, finally, nobody has mentioned on the possibility of merging multiple snapshots into a single snapshot. Would this be possible, to create a snapshot that contains the most recent version of each file present across all of the snapshots (including files which may be present in only one of the snapshots)? signature.asc Description: This is a digitally signed message part
Re: Questions on incremental backups
On Fri, 18 Jul 2014 05:34:22 -0700 Duncan 1i5t5.dun...@cox.net wrote: Effectively, admins can choose NOCOW XOR frequent-snapshotting, altho the fact that snapshots stop at subvolume borders can be used as a partial workaround, by putting NOCOW files on a dedicated partition and not snapshotting it, exactly as I mentioned. You can't backup running VM images and datafiles of an active database using traditional backup techniques such as file copy or rsync. The tail of a file you're copying for a backup will be long-inconsistent with the overall state or the head of the file when you started copying. Snapshots on the other hand are atomic, and can very much be used to create a static copy of the files for the purposes of compressing/copying away somewhere. And at worst, the restored from backup state of such a backed up VM or DB will be equivalent to it just having had a power-loss. Journalling FSes and databases can deal with that with no major problems. So just exercise moderation, snapshot e.g. once an hour or even a day, the result will still be better than not using NOCOW, and will deliver most of the benefits you get by snapshotting. Another option is to snapshot-backup-delete snapshot. -- With respect, Roman signature.asc Description: PGP signature
Re: Questions on incremental backups
On Fri, 18 Jul 2014 13:56:58 Sam Bull wrote: On ven, 2014-07-18 at 14:35 +1000, Russell Coker wrote: Ignoring directories in send/recv is done by subvol. Even if you use rsync it's a good idea to have different subvols for directory trees with different backup requirements. So, an inner subvol won't be backed up? If I wanted a full backup, I would presumably get snapshots of each subvol separately, right? If you use btrfs send/recv then it won't get the inner subvol. If you use rsync then by default it goes through the entire directory tree unless you use the -x option. Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list. As said, I don't mind developing the software. But, is the required information easily available? Is there a way to get a diff, something like a list of changed/added/removed files between snapshots? Your usual diff utility will do it. I guess you could parse the output of btrfs send. And, finally, nobody has mentioned on the possibility of merging multiple snapshots into a single snapshot. Would this be possible, to create a snapshot that contains the most recent version of each file present across all of the snapshots (including files which may be present in only one of the snapshots)? There is no btrfs functionality for that. But I'm sure you could do something with standard Unix utilities and copying files around. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
On Fri, 18 Jul 2014 16:55:26 +0600 Roman Mamedov r...@romanrm.net wrote: On Fri, 18 Jul 2014 10:45:37 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted: Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 and previous). I have snapshots every 15 mins on some subvols. Very large numbers of snapshots can cause performance problems. I suggest keeping below 1000 snapshots at this time. The other caveat with btrfs snapshots is how they deal with NOCOW files, the usual workaround recommended for large (Gig-ish-plus) internal- rewrite-pattern files such as databases and VM images. And how do they deal with them? To the best of my knowledge there is no caveat whatsoever, NOCOW and snapshots interact perfectly, exactly as it should be (snapshotted and then changed bits get COW'ed, but only once). Yes, but the fact that NOCOW files must never-the-less be COWed anyway on the first write to a block after a snapshot isn't exactly intuitive to many admins, and even to many list regulars until relatively recently. For some time the recommendation for active large database files and VM images (the ones I mentioned) was to make them NOCOW in ordered to avoid extreme fragmentation, and people were still reporting extreme fragmentation and the related performance issues even when the files were properly NOCOWed at creation. Turned out the reason was that they had scripted auto-snapshotting enabled, sometimes snapshotting the files as often as once a minute! With an active VM writing data more or less randomly to its image equally often, NOCOW lost its effectiveness as the snapshotting was forcing COW writes most of the time anyway! In the context of frequent snapshots, NOCOW is in practice broken and doesn't do what the label would indicate, thus the caveat. Effectively, admins can choose NOCOW XOR frequent-snapshotting, altho the fact that snapshots stop at subvolume borders can be used as a partial workaround, by putting NOCOW files on a dedicated partition and not snapshotting it, exactly as I mentioned. -- Duncan - No HTML messages please, as they are filtered as spam. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
And, finally, nobody has mentioned on the possibility of merging multiple snapshots into a single snapshot. Would this be possible, to create a snapshot that contains the most recent version of each file present across all of the snapshots (including files which may be present in only one of the snapshots)? There is no btrfs functionality for that. But I'm sure you could do something with standard Unix utilities and copying files around. You could probably use UnionFS or one of the alternatives to get a merged view of a group of snapshots. You could then copy that merged view and delete the original snapshots. I'm not sure if there's anything special you have to do metadata-wise to turn that merged view into something btrfs would still recognize as a snapshot though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
It's not about snapshots but here is an other incremental backup recipe for optical mediums like DVDs, BlueRays: Base Backup: 1) Create encrypted loopback devices of DVD or BlueRay sizes. 2) Create a compressed multi device Btrfs spanning these loopback devices. (To save space, you may use single metadata if this is not your only backup) 3) Rsync your data into this fs. 4) Unmount it and make it SEED fs (btrfstune -S 1..) 5) Burn loopback device files to DVDs, Bluerays. Incremental Part: a) Before your next backup, create additional encrypted loopback devices as needed. b) Mount your base backup. (It will mount as read-only) c) Add devices created at (a) to your base backup fs. d) Rsync into your fs. (Note that incremental data will only go into the devices at (a) e) Unmount all. f) Only burn devices at (a) to DVDs, Bluerays. These are your incremental disks. Regards, Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
On 07/18/14 06:40, Russell Coker wrote: Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list. As said, I don't mind developing the software. But, is the required information easily available? Is there a way to get a diff, something like a list of changed/added/removed files between snapshots? Your usual diff utility will do it. I guess you could parse the output of btrfs send. Following this thought, one step closer in getting a text diff can be to use fardump. It takes a btrfs send binary stream and outputs the send instructions in plaintext. (https://kernel.googlesource.com/pub/scm/linux/kernel/git/arne/far-progs). It certainly would be awesome if btrfs-progs could have an extra parameter to just generate the list of changed/added/removed files between snapshots as all the needed infrastructure is already in place. And, finally, nobody has mentioned on the possibility of merging multiple snapshots into a single snapshot. Would this be possible, to create a snapshot that contains the most recent version of each file present across all of the snapshots (including files which may be present in only one of the snapshots)? There is no btrfs functionality for that. But I'm sure you could do something with standard Unix utilities and copying files around. Sure, but the management of data deduplication is left to the user (presumably using cp --reflink) which is not trivial. Does anybody knows how safe it is to use duperemove or bedup? Any recommendations on how to effectively deduplicate btrfs at this point? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Questions on incremental backups
I've a couple of questions on incremental backups. I've read the wiki page, and would like to confirm my understanding of some features, and also see if other features are possible that are not mentioned. I'm looking to replace my existing backup solution, and hoping to match the features I currently use, and go a little beyond. === Daily snapshot === So, if I understand correctly, I can make a daily snapshot of my filesystem with very little overhead. Then these can later be synced efficiently to another system (only syncing the differences), so I can backup regularly over the internet to my server, and also to an external HDD. After syncing, I can delete the snapshots (other than the trailing one needed for the next backup). In this way I can keep a constant stream of daily backups even when offline, and simply sync them next time I am online before deleting them locally. === Ignore directories === Due to storage limitations on my server, is it possible to ignore certain directories? For example, ignoring the folder that stores all my games, as this could be rather large, and the contents can easily be re-downloaded. The instructions involve subvolumes, so maybe it's possible to ignore a subvolume when syncing? If that is possible, then is it also possible to have a separate backup that does include the ignored directory? For example, having the smaller sync to the storage-limited server, but having a full sync to an external HDD. === Display backups === Is it possible to view the contents of all backups? So, the expected interface would be something like a tree of all files from across all snapshots. Any files that are not present in the latest snapshot would be greyed out to show they have been deleted. Selecting a file would show a list of versions of the file, with one version for each snapshot the file has been modified in. As long as I can get access to this information, maybe some kind of diff between snapshots, I'm willing to write the actual software to display this interface. (I suppose even if it's not supported, I could crawl through the filesystems and generate some kind of database, but that sounds like a painful process.) === Merge snapshots down === Is there some way to merge snapshots down? So, I could merge the last week of daily snapshots into a single weekly snapshot. The new snapshot should include all files across all the snapshots (even if deleted in some of the snapshots), and include just the latest version of each file. This way, I'd like to maintain daily snapshots, which can be regularly merged down into weekly snapshots, and then into monthly snapshots, and then finally into yearly snapshots. And, finally, there's no problem in deleting old snapshots? I'm assuming any data from these snapshots used by other snapshots will still be referenced by the other snapshots, and thus be retained, so nothing will break? signature.asc Description: This is a digitally signed message part
Re: Questions on incremental backups
Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 and previous). I have snapshots every 15 mins on some subvols. Very large numbers of snapshots can cause performance problems. I suggest keeping below 1000 snapshots at this time. You can use send/recv functionality for remote backups. So far I've used rsync, it works well and send/recv has some limitations about filesystem structure etc. Rsync can transfer to a ext4 or ZFS filesystem if you wish. Ignoring directories in send/recv is done by subvol. Even if you use rsync it's a good idea to have different subvols for directory trees with different backup requirements. Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list. I use date based snapshots on my backup BTRFS filesystems and I can easily delete snapshots in the middle of the list. -- Sent from my Samsung Galaxy Note 2 with K-9 Mail. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html