Hi all,

As promised here are the backup notes from my talk at SLUG on 28 Nov.

Materials related to this talk are at
http://users.puzzling.org/users/mary/Presentations/SLUG2008/ (including
a version of these notes).

A note about style: this is a set of recommendations purely based on the
fact that I have both backed my home data up AND recovered it. And
having some working backup regime is better than none. I don't claim
this is the One Best Way, merely One of the Adequate Ways That Isn't
Entirely Maddening.

A note about me: I am sadly short of time at the moment and will not
participate in the thread following this post (should there be one), and
I can't give one-on-one help to design your personal backup regime.
Sorry about that: hopefully slug@slug.org.au can help you out.

The talk was on backups for home users. It doesn't cover
mission-critical or business-grade backups.

--- The magical 10 second version ---

If you don't have backups, you should.

Here's how:

 1. Go out, right now, and buy an external hard drive as big as, or
    bigger than your main hard drive. Yep, there is no free lunch.

 2. Install the program called rdiff-backup.

 3. Plug in the drive and run:

    sudo rdiff-backup --exclude-other-filesystems ::/ ::/media/disk

    (/media/disk being the place your external drive mounted, under
    Ubuntu, substitute as needed)

  4. Run that as often as you can.

(Every so often, run "sudo rdiff-backup --force --remove-older-than 60D
::/media/disk" or similar to delete very old backups.)

See http://jwz.livejournal.com/801607.html for someting similar (and the
partial inspiration for the talk), although rsync doesn't save older
data, which I definitely recommend.

--- More about rdiff-backup ---

Do check out the webpage and "man rdiff-backup" for full details.
http://www.nongnu.org/rdiff-backup/

In summary, it's 'reverse' increments, if you will. That is, you can get
the most recent backup just by looking at the filesystem under
/media/disk. Older versions are recovered by rdiff-backup applying older
and older chnages incrementally to the files, and are recovered like so:

sudo rdiff-backup -r 1D /media/disk/path-to-file [destination you'd like to 
restore to]

--- Why you need backups ---

You may not want to protect against all of these things: some of them
are expensive or time-consuming to protect against. But consider these
risks when deciding on your backup regime.

1. Accidental deletion: very common. An on-site backup is good enough to
   recover from this.

2. Media failure (dead hard drive). This will happen to you, sooner or
   later. You may or may not get any warning. An on-site backup *on a
   different disk* is good enough to recover from this. Not a different
   partition, a different *disk*.

   This is the only one RAID helps with too provided that (a) you have a
   full mirror on the other disk(s) in the array and (b) you don't stuff
   it up somehow and set the new empty disk as the master. RAID is not a
   substitute for backups. *Not* a substitute for backups. It won't help
   with 1, 3, 4 or 5.

3. Software failure. Say some bit of code, from the drive firmware to the
   filesystem to the end user software (eg GIMP) has a bug in it and
   writes out your data incorrectly. In most cases this is rather like
   accidental deletion, but if the bug is very low level (kernel) it may
   affect the backup too.

   At the very least, have your backup drive be not the same
   manufacturer and model as your main drive. This makes them less
   likely to share the same bugs and less likely to fail at the same time.

4. Provider failure. You have uploaded your valuble data to Flickr,
   LiveJournal, WordPress.com etc etc. They go bust, and their creditors
   swoop in, turn their machines off and sell them for scrap parts. This
   really happens, see http://blogs.zdnet.com/digitalcameras/?p=362 for
   an example.

   Smaller examples are the occasional data loss that a lot of web
   services, up to and including those run by Google, have.

5. Massive local failure. Flood, fire, surge: we had victims of two of
   these at the meeting. And someone who had had all their computer
   equipment stolen in one go. To recover from these you need an(other)
   backup, as far away from your main data store as you can. At least in
   a different suburb. Another country is entirely possible these days,
   if you have broadband.

--- Media recommendations ---

For home users, get another hard drive and backup to that.

Optical media: no. CDs and DVDs are too small for most people now. You
will have to insert at least 5 of the things for a full backup cycle. So
that's boring and dull, so you'll never do it. Also, they have a
short-ish lifespan and testing their backup goodness is even *more*
boring and dull, so you definitely won't ever do that.

Solid state media: no. Consumer grade SSDs are really really unreliable
right now. You need to back them up, not use them *for* your backups!
See http://valhenson.livejournal.com/25228.html for an extended take on
SSD maturity.

When either your main drive or your backup drive fails GO AND BUY
ANOTHER ONE RIGHT AWAY.

--- Testing backups ---

I tend to do this by needing to use them about once every three weeks
(accidental deletion). Otherwise, at least fsck them every so often for
basic integrity checks.

--- Emergency recovery tools ---

You shouldn't need these with good enough backups, but just in case...

=== Accidental deletion ===

STOP WRITING TO THE DRIVE RIGHT AWAY. Either re-mount it read-only, or
take an image of it if you have room:
dd if=/dev/drive of=/mnt/otherdrive/drive.image

Try photorec and foremost to find files on it:
http://www.cgsecurity.org/wiki/PhotoRec and
http://foremost.sourceforge.net/ (installable from package repositories)

photorec is good for more than photos: deleted documents and video are
often found too. (It's called photorec because it was originally
designed to get files back when deleted from digicam memory cards.)

=== Media failure ===

If it's only a partial failure (as in, you're still reading/writing sort
of OK, just with increasing numbers of failures) STOP WRITING TO THE
DRIVE RIGHT AWAY. FSCKS INCLUDED (that's the "errors have been found on
your drive, fix y/n?" thing. If these keep happening over and over stop
trying to fix them, turn your machine off, get the disk out and get your
data off!)

You need to take an image of it onto another, bigger, drive, ddrescue is
good for this because it is especially designed for imaging damaged
drives. Confusingly the installable *package* is called gddrescue, but
the commandline tool is ddrescue:
ddrescue /dev/drive /mnt/otherdrive/drive.image

then you can fsck the image (since it's now on a good drive):
fsck /mnt/otherdrive/drive.image

then mount the image and see what you can dig out of it:
mount -o loop /mnt/otherdrive/drive.image /mnt/mountpoint

--- Remote backups ---

In addition to your spare hard drive, consider remote backup. There are
several forms:

1. Sneaker-net. Buy *another* external hard drive, bring it home once a
   week or so, back up to it, take it to your work and store it there (or
   at someone else's house or in a safety deposit or whatever). This is
   a nuisance, but for large amounts of data it is the cheapest.

2. S3. Amazon's S3 storage is US10c per GB to put data in, US15c per
   GB per month to keep data, and US17c per GB to get data out. This
   starts to add up when you're talking hundreds of GB (for 200GB it's
   about US$30 a month)
   http://aws.amazon.com/s3/

   There are lots of tools for putting data in S3, see
   http://jeremy.zawodny.com/blog/archives/007641.html
   I've heard good things about Duplicity
   http://www.nongnu.org/duplicity/ here, but if you do one of the
   filesystem things you could even use rdiff-backup.

3. Dreamhost personal backup. See the CAUTION below, but the webhost
   Dreamhost now has 50GB specifically for personal backups (10c per GB
   after 50):
   http://wiki.dreamhost.com/V10.08_August_2008
   http://wiki.dreamhost.com/Personal_Backup

CAUTION: There are lots of webhosts that advertise hundred of GBs, or
terabytes of disk space. DO NOT USE THESE FOR BACKUPS. They (almost?)
all have a term of use that says that you cannot use them for non-public
data such as backups, you're supposed to use them for web accessible
data only. People have had their backups on these services deleted
without notice.

CAUTION 2: The remote backup space is more than a little crowded right
now. Keep an eye on your provider, some are undoubtedly headed for
failure.

CAUTION 3: Essentially all commercial remote backup providers require
that you not violate copyright by putting data on their servers that you
do not have the rights to copy.

CAUTION 4: You might be tempted to encrypt your remote backups, if so,
do think about where you are going to keep the key so that you have it
after your flood, fire and surge!

--- Data you store elsewhere ---

Try and get hold of your (g|e)mail, your calendars, your address book,
your photos, your blog, your social network ... all your meta-data. Some
of this is quite hard or impossible to get hold of: eg LiveJournal
doesn't allow exports of comments on your journal, Facebook doesn't
allow much our at all.

Dump them to your local hard drive and they'll be picked up by the usual
backups.

--- Fancy stuff ---

The surest way to back up is automatically! Consider using cron to
backup. You can also try udev for backups whenever a particular drive is
plugged in (see http://www.cafuego.net/2007/11/11/time-machine-kinda for
ideas on this), and it's apparently possible to get Network Manager to
automatically run backups when you connect to a particular network,
useful for laptops.

If you're backing up databases (eg, MySQL for your blog or whatever)
make sure to dump them before backing up: live database backups seldom
restore well, you need dumps.

--- My own backups ---

Linode server: backed up nightly to a machine in my house via
rdiff-backup. That backup is in turn rsync-ed to a separate disk.

Home server (contains mail, photos, music and financial records):
there are two disks in it, one is an rdiff-backup of the other. I want
to implement a remote backup of everything except the music, but it's
still a lot of data.

Laptop: same as the Linode, but I trigger the backup manually. (If you
do this, try and reduce it to a single button press so you'll do it
fairly often.)

--- Misc info ---

Some people asked whether there is anything going on in the space of
pressuring web services to provide full backup (or general export)
solutions: see http://autonomo.us/ and http://www.dataportability.org/
for some movement in this space.

-Mary
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to