Re: Understanding btrfs and backups = automatic snapshot script

2014-03-21 Thread Duncan
Marc MERLIN posted on Thu, 20 Mar 2014 22:57:33 -0700 as excerpted:

 On Sun, Mar 16, 2014 at 10:42:24PM -0700, Marc MERLIN wrote:
 On Thu, Mar 06, 2014 at 09:33:24PM +, Duncan wrote:
  However, best snapshot management practice does progressive snapshot
  thinning, so you never have more than a few hundred snapshots to
  manage at once.
 
 I'm happy to share my script with others if that helps:
 http://marc.merlins.org/linux/scripts/btrfs-snaps
 
 Now added to
 http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-
Setup-Netapp-Style-Snapshots.html

Hmm... I hadn't actually looked that closely at scripted snapshotting.  
Now that I did, and see how easy it is to manage both snapshotting and 
thinning, I just might.

But I recently switched to systemd, including replacing my crons with 
timer-unit scripts (which I setup like cron.hourly.d, daily.d, etc, but 
didn't have but those two to worry about, so didn't setup weekly or 
beyond).  I've not actually unmerged cron yet, but I probably will one of 
these days.  Anyway, I might well find myself setting up weekly/quarterly/
whatever too, with your script or something like it modified for systemd-
timer usage.  It'd give me an excuse to practice my unit-file setup 
skills some more. =:^)


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups = automatic snapshot script

2014-03-20 Thread Marc MERLIN
On Sun, Mar 16, 2014 at 10:42:24PM -0700, Marc MERLIN wrote:
 On Thu, Mar 06, 2014 at 09:33:24PM +, Duncan wrote:
  However, best snapshot management practice does progressive snapshot 
  thinning, so you never have more than a few hundred snapshots to manage 
  at once.  Think of it this way.  If you realize you deleted something you 
  needed yesterday, you might well remember about when you deleted it and 
  can thus pick the correct snapshot to mount and copy it back from.  But 
  if you don't realize you need it until a year later, say when you're 
  doing your taxes, how likely are you to remember the specific hour, or 
  even the specific day, you deleted it?  A year later, getting a copy from 
  the correct week, or perhaps the correct month, will probably suffice, 
  and even if you DID still have every single hour's snapshots a year 
  later, how would you ever know which one to pick?  So while a day out, 
  hourly snapshots are nice, a year out, they're just noise.
 
 I'm happy to share my script with others if that helps:
 http://marc.merlins.org/linux/scripts/btrfs-snaps

Now added to
http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html
(mostly to seed google and the archives)

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-13 Thread Chris Samuel
On Sun, 9 Mar 2014 03:30:44 PM Duncan wrote:

 While I realize that was in reference to the up in flames comment and 
 presumably if there's a need to worry about that, offsite backup /is/ of 
 some value, for some people, offsite backup really isn't that valuable.

Actually I missed that comment altogether, it was really just an illustration 
of why people should think about it - and then come to a decision about 
whether or not it makes sense for them.

In your case maybe not, but for me (and my wife) it certainly does.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



signature.asc
Description: This is a digitally signed message part.


Re: Understanding btrfs and backups

2014-03-13 Thread Chris Murphy

On Mar 7, 2014, at 7:03 AM, Eric Mesa ericsbinarywo...@gmail.com wrote:
 
 Duncan - thanks for this comprehensive explanation. For a huge portion of
 your reply...I was all wondering why you and others were saying snapshots
 aren't backups. They certainly SEEMED like backups. But now I see that the
 problem is one of precise terminology vs colloquialisms. In other words,
 snapsshots are not backups in and of themselves. They are like Mac's Time
 Machine. BUT if you take these snapshots and then put them on another media
 - whether that's local or not - THEN you have backups. Am I right, or am I
 still missing something subtle?

Hmm, yes because snapshots on a mirrored drive are on another media but that's 
still not considered a backup. I think what makes a backup is separate device 
and separate file system. That's because the top vectors for data loss are: 
user induced, device failure, and file system corruption. These are 
substantially mitigated by having backup files located both on separate file 
systems and device.

Also, Time Machine qualifies as a backup because it copies files to a separate 
device with a separate file system. (There is a feature in recent OS X versions 
that store hourly incremental backups on the local drive when the usual target 
device isn't available - these are arguably not backups but rather snapshots 
that are pending backups. Once the target device is available, the snapshots 
are copied over to it.)

If you have data you feel is really important, my suggestion is that you have a 
completely different backup/restore method than what you're talking about. It 
needs to be bullet proof, well tested. And consider all the Btrfs send/receive 
work you're doing as testing/work-in-progress. There are still cases on the 
list where people have had problems with send/receive, both the send and 
receive code have a lot of churn, so I don't know that anyone can definitively 
tell you that a btrfs send/receive only based backup is going to reliably 
restore in one month let alone three years. Should it? Yes of course. Will it?


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-09 Thread Duncan
Chris Samuel posted on Sun, 09 Mar 2014 15:13:42 +1100 as excerpted:

 On Fri, 7 Mar 2014 04:14:16 PM Sander wrote:
 
 But if the filesystem or underlaying disk goes up in flames, the
 snapshots are toast as well. So you need additional backups, preferably
 not on the same hardware, for real protection against data loss.
 
 ...and don't forget to think about off-site backups too.
 
 http://www.flickr.com/photos/94482242@N00/7746409996/

While I realize that was in reference to the up in flames comment and 
presumably if there's a need to worry about that, offsite backup /is/ of 
some value, for some people, offsite backup really isn't that valuable.

I figure if something like that happens here, I'll have FAR more pressing 
things to worry about for awhile than restoring my computer.  And by the 
time life does get somewhat back to normal and I can think about the data 
that was on the computer, I might as well do over from scratch, like I 
will have done with much of the rest of my life by that point.  The real 
valuable data is backed up where it counts -- to my head -- and if I lose 
that, well, I won't be very worried about it any more, will I?

Of course if I were a bush doctor like the guy who owned the computer in 
that photo apparently was, then there'd be other people's medical records 
and the like to worry about too, and having offsite backups of that 
/would/ be important!

And of course the same would apply if I had a bunch of family pictures on 
the computer to worry about, but for that I'd need a family first...

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-09 Thread Duncan
Wolfgang Mader posted on Fri, 07 Mar 2014 11:13:51 +0100 as excerpted:

 Duncan, thank you for this comprehensive post. Really helpful as always!
 
 [...]
 
 As for restoring, since a snapshot is a copy of the filesystem as it
 existed at that point, and the method btrfs exposes for accessing them
 is to mount that specific snapshot, to restore an individual file from
 a snapshot, you simply mount the snapshot you want somewhere and copy
 the file as it existed in that snapshot over top of your current
 version
 
 Please, how do I list mounted snapshots only?
 
 [...]

I personally don't use snapshots a whole lot (tho I like the concept) as 
they don't really fit my use-case.  So in general I won't try to answer 
usage-detail questions such as that.

That said, see the Managing snapshots section on the sysadmin guide 
page on the wiki, for some general snapshot management hints.

https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Managing_snapshots

The main point from there is to leave the top level of the filesystem 
empty but for the subvolumes/snapshots (see the tree diagrams) and to set 
a default subvolume that will be your normal subvolume-mount if you don't 
specify one.  Then you can mount the root subvolume (subvolid=0, see the 
fstab line for /media/btrfs) when you want to manage snapshots.

But the example there is full snapshot rollback.  To restore an 
individual file instead of that, you'd just mount the root subvolume and 
the snapshots would all appear as subdirs, such that you could browse 
them as you would a normal filesystem, diving into the snapshot and its 
subdirs until you find the file you want to restore, and then copying it 
over to the working copy/snapshot.

That doesn't directly answer how to list mounted snapshots only, but 
given the above tree layout, I don't really see that you'd /need/ to list 
mounted snapshots only, since presumably you'd have only the default 
mounted, plus the root subvolume, where you could browse into all the 
snapshots just as if they were normal directories.

Also see the subvolumes and snapshots section of the FAQ:

https://btrfs.wiki.kernel.org/index.php/FAQ#Subvolumes

 Since a snapshot is an image of the filesystem as it was at that
 particular point in time, and btrfs by nature copies blocks elsewhere
 when they are modified, all (well, not all as there's metadata like
 file owner, permissions and group, too, but that's handled the same
 way) the snapshot does is map what blocks composed each file at the
 time the snapshot was taken.
 
 Is it correct, that e.g. ownership is recorded separately from the data
 itself, so if I would change the owner of all my files, the respective
 snapshot would only store the old owner information?

Yes.  If you change the owner of the files in your current subvolume, 
the previous snapshots will retain their old ownership.  Owner/
permissions/etc are metadata, stored separately from the actual data, 
with both data and metadata being snapshotted.


[ on btrfs send/receive ]
 
 Is the receiving side a complete file system in its own right?

Normally, yes.

However, send normally serializes its output to STDOUT and that output 
can be sent to a specific file on some other filesystem (like ext4), or 
to tape or whatever, instead.  In this case you can read back from that 
file using cat (or netcat if it's over the network, or whatever), 
directing its output to btrfs receive, to turn that data back into a 
filesystem.  Used like this, you can think of the original send as a full 
backup (to tape or whatever), and child sends as incremental backups.  
Obviously, if stored in this form, in ordered to restore the incrementals 
you'd need the full backup they were based upon, just as you would if 
doing the same thing using conventional backup to tape or whatever.

 If so, I only need to maintain one common reference in order to apply
 the received snapshot, right. If I would in any way get the send and
 receive side out of sync, such that they do not share a common
 reference any more, only the send/receive would fail, but I still would
 have the complete filesystem on the receiving side, and could copy it
 all over (cp, rscync) to the send side in case of a disaster on the
 send side. Is this correct?

In the normal case (not stored as a file or serialized data stream as 
described above), yes.

Meanwhile, given that we're talking of btrfs send/receive in the context 
of backups, it's worth explicitly making note of the current on-list 
reports and bugfixes in area of send/receive.  In general, we're talking 
about an in-principle feature that should eventually be reliable enough 
to use as backup in the way discussed.  However, at present, if it's data 
you'd really miss were it to disappear, please back it up using another 
method (say rsync or conventional backups) as well.  To my knowledge, if 
the send and receive both occur without error, it should be a faithful 
copy of the data just as reliable as the 

Re: Understanding btrfs and backups

2014-03-09 Thread Duncan
Eric Mesa posted on Fri, 07 Mar 2014 14:03:44 + as excerpted:

 Duncan - thanks for this comprehensive explanation. For a huge portion
 of your reply...I was all wondering why you and others were saying
 snapshots aren't backups. They certainly SEEMED like backups. But now I
 see that the problem is one of precise terminology vs colloquialisms. In
 other words, snapsshots are not backups in and of themselves. They are
 like Mac's Time Machine. BUT if you take these snapshots and then put
 them on another media - whether that's local or not - THEN you have
 backups. Am I right, or am I still missing something subtle?

You got it. =:^)

Tho as I just mentioned in a reply on a different subthread, it's worth 
noting that btrfs send/receive is still a bit buggy at present and is 
giving people with corner-cases some errors.  To my knowledge, if both 
the send and receive sides complete without error, it's a perfectly 
reliable backup.  The problem is, they aren't always completing without 
errors at present, and I'd hate to have to actually need a current backup 
shortly after those send/receives started triggering errors, before I had 
a chance to put a different solution in place.  So at this point I'd 
recommend having that other solution in place from the beginning, just in 
case.

IOW, it's fine to play with send/receive right now, but don't depend on 
it with your life, or the life of your data!  In a year or even six 
months, hopefully those bugs should be worked out and it'll be reliable 
as the sun rise, but I wouldn't count on that for my own data ATM, and 
I'd recommend you don't either.

Tho as I said, to the best of my knowledge, if both sides complete 
without error, it's as reliable as btrfs itself is ATM.  (Tho while 
kernel 3.13 did tone down the might-eat-your-babies warning on the 
kernel's btrfs config option, it's still what I'd classify as semi-
stable, so keep those backups updated and tested, and run current 
kernels since older kernels do still mean known bugs that are fixed in 
current!)

 I think the most important thing you said was at the end and I'd like a
 little clarification on that if it's OK with you.
 
 As with local snapshots, old ones can
 be deleted on both the send and receive ends, as long as at least one
 common reference snapshot is maintained on both ends, so diffs taken
 against the send side reference can be applied to an appropriately
 identical receive side reference, thereby updating the receive side to
 match the new read-only snapshot on the send side.
 
 So, let's say I have everything set up. This means I created the
 read-only shot on my home btrfs volume and sent it to the backup drive.
 I'm making hourly snapshots and after each snapshot is made, it's sent
 to the backup drive. So, obviously the backup drive needs to be at least
 as big as the home drive so it can store what's on home plus the
 snapshot-diffs. Now let's be extreme and say that in the course of a
 year I touch and somehow change every single file on the home drive.
 That means if I only had one snapshot I'd need home drive x 2 space.
 (for used space, not unused space, naturally)

Well, not strictly as you said.  If you changed every BLOCK of every file 
over that year, THEN you'd need 2X the space.  But if a lot of those 
files are say half-gig-plus ISOs and you only changed say one word of one 
file on each ISO, then no, it wouldn't be the whole files changed, only a 
single individual (btrfs size, 4 KiB AFAIK) block within the file, and 4 
KiB out of half a gig is under 1/10 of 1 percent, so you wouldn't need 2X 
the space in a scenario like that.

 So I might want my backups to have last's year's data, but wouldn't want
 to need to upgrade the size of my actual home drive. So I would want to
 maintain less snapshots on my home drive than my backup drive. (It's
 possible I'm missing something here...something subtle that makes this
 not necessary) So do I only need to make sure I have the latest snapshot
 or maybe latest plus n-1 on the home drive while the backup drive can
 have all snapshots since the beginning? I THINK that can be the case
 based on reading your sentence, but I just want to make sure.

In general, yes.  Tho if you're doing hourly snapshots I'd probably keep 
say a day's worth locally, plus one a day for a week, and 1 weekly 
snapshot before that, just to cover the case of the my needing to recover 
a backup and finding that the remote backup just keeled over 12 hours 
ago.  Unless you're writing/erasing heavily, snapshots take up very 
nearly zero space, so keeping a few extra around isn't going to hurt a 
whole lot.

Meanwhile, however, I'd suggest a reasonable thinning down script on the 
remote backup as well, because at least at present, there are overhead 
issues once you get over several hundred snapshots.  But realistically, 
if you decide you need a file 11 months old, are you really going to care 
or even know exactly what hour it was, eleven months 

Re: Understanding btrfs and backups

2014-03-08 Thread Chris Samuel
On Fri, 7 Mar 2014 04:14:16 PM Sander wrote:

 But if the filesystem or underlaying disk goes up in flames, the
 snapshots are toast as well. So you need additional backups,
 preferably not on the same hardware, for real protection against
 data loss.

...and don't forget to think about off-site backups too.

http://www.flickr.com/photos/94482242@N00/7746409996/

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-07 Thread Wolfgang Mader
Duncan, thank you for this comprehensive post. Really helpful as always!

[...]

 As for restoring, since a snapshot is a copy of the filesystem as it
 existed at that point, and the method btrfs exposes for accessing them is
 to mount that specific snapshot, to restore an individual file from a
 snapshot, you simply mount the snapshot you want somewhere and copy the
 file as it existed in that snapshot over top of your current version
 (which will have presumably already been mounted elsewhere, before you
 mounted the snapshot to retrieve the file from), then unmount the
 snapshot and go about your day. =:^)

Please, how do I list mounted snapshots only?

[...]

 
 Since a snapshot is an image of the filesystem as it was at that
 particular point in time, and btrfs by nature copies blocks elsewhere
 when they are modified, all (well, not all as there's metadata like
 file owner, permissions and group, too, but that's handled the same way)
 the snapshot does is map what blocks composed each file at the time the
 snapshot was taken.

Is it correct, that e.g. ownership is recorded separately from the data 
itself, so if I would change the owner of all my files, the respective 
snapshot would only store the old owner information?

[...]

 
 The first time you do this, there's no existing copy at the other end, so
 btrfs send sends a full copy and btrfs receive writes it out.  After
 that, the receive side has a snapshot identical to the one created on the
 send side and further btrfs send/receives to the same set simply
 duplicate the differences between the reference and the new snapshot from
 the send end to the receive end.  As with local snapshots, old ones can
 be deleted on both the send and receive ends, as long as at least one
 common reference snapshot is maintained on both ends, so diffs taken
 against the send side reference can be applied to an appropriately
 identical receive side reference, thereby updating the receive side to
 match the new read-only snapshot on the send side.

Is the receiving side a complete file system in its own right? If so, I only 
need to maintain one common reference in order to apply the received snapshot, 
right. If I would in any way get the send and receive side out of sync, such 
that they do not share a common reference any more, only the send/receive 
would fail, but I still would have the complete filesystem on the receiving 
side, and could copy it all over (cp, rscync) to the send side in case of a 
disaster on the send side. Is this correct?

Thank you!
Best,
Wolfgang

-- 
Wolfgang Mader
wolfgang.ma...@fdm.uni-freiburg.de
Telefon: +49 (761) 203-7710
Institute of Physics
Hermann-Herder Str. 3, 79104 Freiburg, Germany
Office: 207
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-07 Thread Eric Mesa
Duncan 1i5t5.duncan at cox.net writes:
 *But*, btrfs snapshots by themselves remain on the existing btrfs 
 filesystem, and thus are subject to many of the same risks as the 
 filesystem itself.  As you mentioned raid is redundancy not backup, 
 snapshots aren't backup either; snapshots are multiple logical copies 
 thus protecting you from accidental deletion or bad editing, but pointed 
 at the same data blocks without redundancy, and if those data blocks or 
 the entire physical media go bad...
 
 Which is where real backups, separate copies on separate physical media, 
 come in, which is where btrfs send/receive, as the ars-technica article 
 was describing, comes in.
 
 The idea is to make a read-only snapshot on the local filesystem, read-
 only so it can't change while it's being sent, and then use btrfs send to 
 send that snapshot to be stored on some other media, which can optionally 
 be over the network to a machine and media at a different site, altho it 
 can be to a different device on the same machine, as well.
 
 The first time you do this, there's no existing copy at the other end, so 
 btrfs send sends a full copy and btrfs receive writes it out.  After 
 that, the receive side has a snapshot identical to the one created on the 
 send side and further btrfs send/receives to the same set simply 
 duplicate the differences between the reference and the new snapshot from 
 the send end to the receive end.  As with local snapshots, old ones can 
 be deleted on both the send and receive ends, as long as at least one 
 common reference snapshot is maintained on both ends, so diffs taken 
 against the send side reference can be applied to an appropriately 
 identical receive side reference, thereby updating the receive side to 
 match the new read-only snapshot on the send side.
 
 Hopefully that's clearer now. =:^)
 


Duncan - thanks for this comprehensive explanation. For a huge portion of
your reply...I was all wondering why you and others were saying snapshots
aren't backups. They certainly SEEMED like backups. But now I see that the
problem is one of precise terminology vs colloquialisms. In other words,
snapsshots are not backups in and of themselves. They are like Mac's Time
Machine. BUT if you take these snapshots and then put them on another media
- whether that's local or not - THEN you have backups. Am I right, or am I
still missing something subtle? 

I think the most important thing you said was at the end and I'd like a
little clarification on that if it's OK with you. 

As with local snapshots, old ones can 
 be deleted on both the send and receive ends, as long as at least one 
 common reference snapshot is maintained on both ends, so diffs taken 
 against the send side reference can be applied to an appropriately 
 identical receive side reference, thereby updating the receive side to 
 match the new read-only snapshot on the send side.

So, let's say I have everything set up. This means I created the read-only
shot on my home btrfs volume and sent it to the backup drive. I'm making
hourly snapshots and after each snapshot is made, it's sent to the backup
drive. So, obviously the backup drive needs to be at least as big as the
home drive so it can store what's on home plus the snapshot-diffs. Now let's
be extreme and say that in the course of a year I touch and somehow change
every single file on the home drive. That means if I only had one snapshot
I'd need home drive x 2 space. (for used space, not unused space, naturally)
So I might want my backups to have last's year's data, but wouldn't want to
need to upgrade the size of my actual home drive. So I would want to
maintain less snapshots on my home drive than my backup drive. (It's
possible I'm missing something here...something subtle that makes this not
necessary) So do I only need to make sure I have the latest snapshot or
maybe latest plus n-1 on the home drive while the backup drive can have all
snapshots since the beginning? I THINK that can be the case based on reading
your sentence, but I just want to make sure. 

In case you were wondering, this is based on what's happened to me with Back
in Time. I had to reduce the number of backups I was keeping because my home
drive wasn't at 100%, but the backupdrive was at 100% because I'd added and
deleted some VMs and other large files (video files I think). And Back in
Time intelligently does not remove the oldest backup off the top until it
knows it has made a new backup - which it couldn't do because it was at
100%. So I had to delete the top 1 or 2 backups and then tell it to keep
less backups. Your description of snapshots makes it seems much less likely
that this would be an issue. Although Back in Time is an incremental backup,
its takes up more space. If I may venture to see if I've learned something
from your response, is it because when I change a file Back in Time stores
the entire changed file while btrfs only stores the bits that have changed?
Also, does it 

Re: Understanding btrfs and backups

2014-03-07 Thread Sander
Eric Mesa wrote (ao):
 Duncan - thanks for this comprehensive explanation. For a huge portion of
 your reply...I was all wondering why you and others were saying snapshots
 aren't backups. They certainly SEEMED like backups. But now I see that the
 problem is one of precise terminology vs colloquialisms. In other words,
 snapsshots are not backups in and of themselves. They are like Mac's Time
 Machine. BUT if you take these snapshots and then put them on another media
 - whether that's local or not - THEN you have backups. Am I right, or am I
 still missing something subtle? 

Snapshots are backups, but only protect you against a limited amount of
disasters. Snapshots are very convenient to quickly go back in time for
some or all files and directories. But if the filesystem or underlaying
disk goes up in flames, the snapshots are toast as well. So you need
additional backups, preferably not on the same hardware, for real
protection against data loss.

The convenience of snapshots is that you can (almost) make them as often
as you want, fully automated, with (almost) no impact on performance,
without the need for extra hardware, and a restore is no more than a
simple copy.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Understanding btrfs and backups

2014-03-06 Thread Eric Mesa
apologies if this is a resend - it appeared to me that it was rejected
because of something in how Gmail was formatting the message. I can't find
it in the Gmane archives which leads me to believe it was never delivered.

I was hoping to gain some clarification on btrfs snapshops and how they
function as backups.

I did a bit of Googling and found lots of examples of bash commands, but no
one seemed to explain what was going on to a level that would satisfy me for
my data needs.

I read this Ars Technica article today
http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/

First of all, the btrfs-raid1 sounds awesome. Because it helps protect
against one of RAID1's failings - bit rot issues. But raid1 is not backup,
it's just redundancy.

Second, the article mentions using snapshots as a backup method. Page 3
section: Using the features.

He makes a snapshot and sends that. Then he sends what changed the second
time. He mentions that because btrfs knows what's changed it's a quick process.

Right now on my Linux computer I use Back in Time which, I think, is just an
rsync frontend. It takes a long time to complete the backup for my 1 TB
/home drive. The copy part is nice and quick, but the comparison part takes
a long time and hammers the CPU. I have it setup to run at night because if
it runs while I'm using the computer, things can crawl.

So I was wondering if btrfs snapshots are a substitute for this. Right now
if I realize I deleted a file 5 days ago, I can go into Back in Time (the
gui) or just navigate to it on the backup drive and restore that one file.
From what I've read about btrfs, I'd have to restore the entire home drive,
right? Which means I'd lose all the changes from the past five days. If
that's the case, it wouldn't really solve my problem - although maybe I'm
just not thinking creatively.

Also, if I first do the big snapshot backup and then the increments, how do
I delete the older snapshots? In other words, the way I'm picturing things
working is that I have the main snapshot and every snapshot after that is
just a description of what's changed since then. So wouldn't the entire
chain be necessary to reconstruct where I'm at now?

On a somewhat separate note, I have noticed that many people/utilities for
btrfs mention making snapshots every hour. Are the snapshots generally that
small that such a think wouldn't quickly fill a hard drive?

Thanks for reading my questions, I appreciate the help. When all is said and
done I'd certainly like to publish a how-to from my point of undertanding.


--
Eric Mesa

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-06 Thread Eric Mesa
Brian Wong wrote: a snapshot is different than a backup, with a snapshot
you're still accessing a read-only version of the live filesystem.  i don't
know the specifics of btrfs but if you take daily snapshots, you should be
able to restore a single file from the five-days-ago snapshot by browsing
that snapshot's directory tree and then copying the file to the live version
of the filesystem, if that makes sense.

in the snapshot case the live filesystem serves the same function as the
full backup would if you did full backups then incrementals.  the snapshots
are the incrementals of the live filesystem, only going backwards in time
whereas with backup you would take a full backup then go forward in time
with incrementals.  the filesystem takes care of making sure every snapshot
is complete.

in the snapshot case redundancy is then more important because you may not
have a bunch of full backups (i.e. full copies) lying around.  so full
backups still are useful.

--

OK, I THINK I understand things a bit better. So from the point of view of
restoring a single file, that functionality is there. Excellent. And I guess
you're saying that because the snapshots are diffs off the live system, that
I'd need a backup of the live system - ie snapshots wouldn't be enough. But
what if my first snapshot was a clone of the system at that point (as it
seems from the article) And I back that up to a separate drive. Let me
illustrate with what I plan to do exactly.

Three hard drives: A, B, and C.

Hard drives A and B - btrfs RAID-1 so that if one drive dies I can keep
using my system until the replacement for the raid arrives.

Hard drive C - gets (hourly/daily/weekly/or some combination of the above)
snapshots from the RAID. (Starting with the initial state snapshot) Each
timepoint another snapshot is copied to hard drive C. 

So in the case of a file disappearing on me or being over-written or w/e - I
reach into the directory of the snapshot that contains the file just as I
would now with the backup. 

So if that's what I'm doing, do snapshots become a way to do backups?

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-06 Thread Eric Mesa
Brian Wong wrote: a snapshot is different than a backup, with a snapshot
you're still accessing a read-only version of the live filesystem.  i don't
know the specifics of btrfs but if you take daily snapshots, you should be
able to restore a single file from the five-days-ago snapshot by browsing
that snapshot's directory tree and then copying the file to the live version
of the filesystem, if that makes sense.

in the snapshot case the live filesystem serves the same function as the
full backup would if you did full backups then incrementals.  the snapshots
are the incrementals of the live filesystem, only going backwards in time
whereas with backup you would take a full backup then go forward in time
with incrementals.  the filesystem takes care of making sure every snapshot
is complete.

in the snapshot case redundancy is then more important because you may not
have a bunch of full backups (i.e. full copies) lying around.  so full
backups still are useful.

--

OK, I THINK I understand things a bit better. So from the point of view of
restoring a single file, that functionality is there. Excellent. And I guess
you're saying that because the snapshots are diffs off the live system, that
I'd need a backup of the live system - ie snapshots wouldn't be enough. But
what if my first snapshot was a clone of the system at that point (as it
seems from the article) And I back that up to a separate drive. Let me
illustrate with what I plan to do exactly.

Three hard drives: A, B, and C.

Hard drives A and B - btrfs RAID-1 so that if one drive dies I can keep
using my system until the replacement for the raid arrives.

Hard drive C - gets (hourly/daily/weekly/or some combination of the above)
snapshots from the RAID. (Starting with the initial state snapshot) Each
timepoint another snapshot is copied to hard drive C. 

So in the case of a file disappearing on me or being over-written or w/e - I
reach into the directory of the snapshot that contains the file just as I
would now with the backup. 

So if that's what I'm doing, do snapshots become a way to do backups?

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-06 Thread Brendan Hide

On 2014/03/06 09:27 PM, Eric Mesa wrote:

Brian Wong wrote: a snapshot is different than a backup
[snip]

...

Three hard drives: A, B, and C.

Hard drives A and B - btrfs RAID-1 so that if one drive dies I can keep
using my system until the replacement for the raid arrives.

Hard drive C - gets (hourly/daily/weekly/or some combination of the above)
snapshots from the RAID. (Starting with the initial state snapshot) Each
timepoint another snapshot is copied to hard drive C.

[snip]...

So if that's what I'm doing, do snapshots become a way to do backups?
An important distinction for anyone joining the conversation is that 
snapshots are *not* backups, in a similar way that you mentioned that 
RAID is not a backup. If a hard drive implodes, its snapshots go with it.


Snapshots can (and should) be used as part of a backup methodology - and 
your example is almost exactly the same as previous good backup 
examples. I think most of the time there's mention of an external 
backup server keeping the backups, which is the only major difference 
compared to the process you're looking at. Btrfs send/receive with 
snapshots can make the process far more efficient compared to rsync. 
Rsync doesn't have any record as to what information has changed so it 
has to compare all the data (causing heavy I/O). Btrfs keeps a record 
and can skip to the part of sending the data.


I do something similar to what you have described on my Archlinux 
desktop - however I haven't updated my (very old) backup script to take 
advantage of btrfs' send/receive functionality. I'm still using rsync. :-/

/ and /home are on btrfs-raid1 on two smallish disks
/mnt/btrfs-backup is on btrfs single/dup on a single larger disk

See https://btrfs.wiki.kernel.org/index.php/Incremental_Backup for a 
basic incremental methodology using btrfs send/receive


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-06 Thread Duncan
Eric Mesa posted on Thu, 06 Mar 2014 18:18:15 + as excerpted:

 apologies if this is a resend - it appeared to me that it was rejected
 because of something in how Gmail was formatting the message. I can't
 find it in the Gmane archives which leads me to believe it was never
 delivered.

Probably HTML-formatted.  AFAIK vger.kernel.org (the list-serv for many 
kernel lists) is set to reject that.  Too bad more list-servs don't do 
likewise. =:^(

 I was hoping to gain some clarification on btrfs snapshops and how they
 function as backups.

Looking at the below it does indeed appear you are confused, but this is 
the place to post the questions necessary to get unconfused. =:^)

 I did a bit of Googling and found lots of examples of bash commands, but
 no one seemed to explain what was going on to a level that would satisfy
 me for my data needs.

You don't mention whether you've seen/read the btrfs wiki or not.  That's 
the most direct and authoritative place to look... and to bookmark. =:^)

https://btrfs.wiki.kernel.org

 I read this Ars Technica article today
 http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-
cows-inside-next-gen-filesystems/
 
 First of all, the btrfs-raid1 sounds awesome. Because it helps protect
 against one of RAID1's failings - bit rot issues. But raid1 is not
 backup, it's just redundancy.
 
 Second, the article mentions using snapshots as a backup method.

Well, this is where you start to be confused.  Snapshots are not backups 
either, altho they're sort of opposite raid in that while raid is 
redundancy-only, snapshots are rollback-only, without the redundancy 
(I'll explain...).

 Page 3 section: Using the features.
 
 He makes a snapshot and sends that. Then he sends what changed the
 second time. He mentions that because btrfs knows what's changed it's a
 quick process.

OK, what that is discussing is btrfs send/receive, with snapshots simply 
part of the process of doing that.  Think rsync in effect, but btrfs-
specific and much more efficient.  Btrfs send/receive does use snapshots 
but only as part of making the send/receive process more reliable and 
efficient.  I'll discuss snapshots (and COW) first, below, then bring in 
btrfs send/receive at the end.

 Right now on my Linux computer I use Back in Time which, I think, is
 just an rsync frontend. It takes a long time to complete the backup for
 my 1 TB /home drive. The copy part is nice and quick, but the comparison
 part takes a long time and hammers the CPU. I have it setup to run at
 night because if it runs while I'm using the computer, things can crawl.
 
 So I was wondering if btrfs snapshots are a substitute for this. Right
 now if I realize I deleted a file 5 days ago, I can go into Back in Time
 (the gui) or just navigate to it on the backup drive and restore that
 one file.

 From what I've read about btrfs, I'd have to restore the entire home
 drive, right? Which means I'd lose all the changes from the past five
 days. If that's the case, it wouldn't really solve my problem -
 although maybe I'm just not thinking creatively.

No, in snapshot terms you don't restore the entire drive.  Rather, the 
snapshots are taken on the local filesystem, storing (like one still 
frame in a series that makes a movie, thus the term snapshot) the state 
of the filesystem at the point the snapshot was taken.  Files can be 
created/deleted/moved/altered after the snapshot, and only the 
differences between snapshots and between the last snapshot and the 
current state are changed.

The fact that btrfs is a copy-on-write (COW) filesystem makes 
snapshotting very easy... trivial... since it's a byproduct of the COW 
nature of the filesystem and thus comes very nearly for free, with only 
hooking up some way to access specific bits of functionality that's 
already there necessary in ordered to get snapshotting.

A copy-on-write illustration (please view with a monospace font for 
proper alignment):

Suppose each letter of the following string represents a block of a 
particular size (say 4KiB) of a file, with the corresponding block 
addresses noted as well:

0111  
1234567890123456

abcdefgxijklmnop

It's the first bit of the alphabet, but notice the x where h belongs.  
Now someone notices and edits the file, correcting the problem:

abcdefghijklmnop

Except when they save the file, a COW-based filesystem will make the 
change like this:

00050111
1234567390123456
||| 
abcdefg ijklmnop
   |
   h

The unchanged blocks of the file all remain in place.  The only change is 
to the one block, which unlike normal filesystems, isn't edited in-place, 
but rather, is written into a new location, and the filesystem simply 
notes that the new location (53) should be used to read that file block 
now, instead of the old location (08).  Of course as illustrated here, 
the addresses each take up two characters while the data block only takes 
up one, but each 

btrfs and backups

2012-03-26 Thread James Courtier-Dutton
Hi,

I have a local btrfs file system with various sub-volumes that have
had snapshots done on them.

Is there some tool like rsync that I could copy all the data and
snapshots to a backup system, but still only use the same amount of
space as the source filesystem.
I see a problem being getting a consistent and steady state during the rsync.
I was thinking that I might be able to do this with LVM snapshots, but
that would require something along these lines:
1) pause the btrfs filesystem into a consistent state that can be
mounted cleanly
2) Do LVM snapshot on it.
3) un-pause btrfs filesystem.

I can then do a block level backup of the LVM snapshot and it should
be mountable on the backup server.
So, the snapshot is not a snapshot of the current filesystem, it is a
snapshot of all the snapshots and all the sub-volumes at a particular
time, that is in a stable state to be backed up.

I don't know if 1 is supported?
I suppose I am hoping for 1,2,3 to already be supported by some
special btrfs command.


Any ideas?

Kind Regards

James
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and backups

2012-03-26 Thread Fajar A. Nugraha
On Mon, Mar 26, 2012 at 3:56 PM, Felix Blanke felixbla...@gmail.com wrote:
 On 3/26/12 10:30 AM, James Courtier-Dutton wrote:
 Is there some tool like rsync that I could copy all the data and
 snapshots to a backup system, but still only use the same amount of
 space as the source filesystem.


 I'm not sure if I understand your problem right, but I would suggest:

 1) Snapshot the subvolume on the source
 2) rsync the snapshot to the destination
 3) Snapshot the destination

James did say only use the same amount of space as the source
filesystem. Your approach would increase the usage when one or more
subvolume shares the same space (e.g. when one subvolume starts as
snapshot).

AFAIK the (planned) way to do this is using btrfs send | receive,
which is not available yet.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and backups

2012-03-26 Thread Duncan
Fajar A. Nugraha posted on Mon, 26 Mar 2012 16:01:54 +0700 as excerpted:

 On Mon, Mar 26, 2012 at 3:56 PM, Felix Blanke felixbla...@gmail.com
 wrote:
 On 3/26/12 10:30 AM, James Courtier-Dutton wrote:
 Is there some tool like rsync that I could copy all the data and
 snapshots to a backup system, but still only use the same amount of
 space as the source filesystem.
 
 
 I'm not sure if I understand your problem right, but I would suggest:

 1) Snapshot the subvolume on the source 2) rsync the snapshot to the
 destination 3) Snapshot the destination
 
 James did say only use the same amount of space as the source
 filesystem. Your approach would increase the usage when one or more
 subvolume shares the same space (e.g. when one subvolume starts as
 snapshot).
 
 AFAIK the (planned) way to do this is using btrfs send | receive,
 which is not available yet.

What about rsyncing the snapshots, one at a time, snapshotting on the 
destination after each one?  I think that's what Felix' idea was.

On the source filesystem, if each of the snapshots is mounted in turn and 
rsynced across, then rsync should only catch the differences between the 
previous and currently rsyncing snapshots.

On the destination, after the first rsync, a snapshot could be taken and 
mounted, so the second rsync is cumulative.  Then a second snapshot can 
be taken, then it mounted, for the next rsync.  Given COW, I'd think 
that'd work.

That's in contrast to an attempted rsync of the root filesystem, which 
would appear to rsync as if each snapshot was a separate directory tree, 
which would indeed kill the data sharing between them, thus taking up N 
times the space of one snapshot on the destination.

But if each snapshot is mounted in turn on both sides, destination of 
course trailing source by one snapshot, in theory at least, it should 
work, tho it depends on the rsync implementation being COW-friendly and 
I'm not positive it is but expect that it should be.

Here, I'd probably do it manually the first few snapshot generations, 
checking usage on the destination to see that it was working as intended 
as I went and ensuring I had the process down, then script parts of it, 
automating the parts, before ultimately combining the scripts into a full 
automation that, depending on the intent, could ideally be run from a cron 
job or the like.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and backups

2012-03-26 Thread Alexander Block
On Mon, Mar 26, 2012 at 4:26 PM, Duncan 1i5t5.dun...@cox.net wrote:
 Fajar A. Nugraha posted on Mon, 26 Mar 2012 16:01:54 +0700 as excerpted:

 On Mon, Mar 26, 2012 at 3:56 PM, Felix Blanke felixbla...@gmail.com
 wrote:
 On 3/26/12 10:30 AM, James Courtier-Dutton wrote:
 Is there some tool like rsync that I could copy all the data and
 snapshots to a backup system, but still only use the same amount of
 space as the source filesystem.


 I'm not sure if I understand your problem right, but I would suggest:

 1) Snapshot the subvolume on the source 2) rsync the snapshot to the
 destination 3) Snapshot the destination

 James did say only use the same amount of space as the source
 filesystem. Your approach would increase the usage when one or more
 subvolume shares the same space (e.g. when one subvolume starts as
 snapshot).

 AFAIK the (planned) way to do this is using btrfs send | receive,
 which is not available yet.

 What about rsyncing the snapshots, one at a time, snapshotting on the
 destination after each one?  I think that's what Felix' idea was.

 On the source filesystem, if each of the snapshots is mounted in turn and
 rsynced across, then rsync should only catch the differences between the
 previous and currently rsyncing snapshots.

 On the destination, after the first rsync, a snapshot could be taken and
 mounted, so the second rsync is cumulative.  Then a second snapshot can
 be taken, then it mounted, for the next rsync.  Given COW, I'd think
 that'd work.

 That's in contrast to an attempted rsync of the root filesystem, which
 would appear to rsync as if each snapshot was a separate directory tree,
 which would indeed kill the data sharing between them, thus taking up N
 times the space of one snapshot on the destination.

 But if each snapshot is mounted in turn on both sides, destination of
 course trailing source by one snapshot, in theory at least, it should
 work, tho it depends on the rsync implementation being COW-friendly and
 I'm not positive it is but expect that it should be.

 Here, I'd probably do it manually the first few snapshot generations,
 checking usage on the destination to see that it was working as intended
 as I went and ensuring I had the process down, then script parts of it,
 automating the parts, before ultimately combining the scripts into a full
 automation that, depending on the intent, could ideally be run from a cron
 job or the like.

 --
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

I used this style of incremental backups for some time and it works well
if you ignore the low performance when backing up very large files
(e.g. VM images).
I'm not 100% sure about this, but I think it's important to use the --inplace
option of rsync if you want to preserve COW. If not used with --inplace,
rsync will create a new file every and after syncing, replace the orifinal. This
will prevent COW at the moment.
Maybe this won't be required in the future when something like auto
deduplication is implemented, but I currently don't know about the plans
for this feature.
however,
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html