> On Fri, 18 Nov 2011, Robert Munro wrote:
> 
> >And weekly backups to a different place. If you don't stop your backups
> >right away upon realizing that something has been corrupted, that will
> >just propagate to your daily backups, rendering them useless to recover.
> 
On Fri, Nov 18, 2011 at 08:16:41PM -0800, Rich Shepard wrote:
>   Dirvish runs every night and records the changes to each file. While Keith
> keeps his backups forever, I keep daily backups for a month and the weekly
> backup (Sunday) for 6 months. In 20 years I've never needed a backup from an
> earlier time.

Robert may not know that dirvish/rsync makes huge nests of really
cheap hardlinks to apparently identical stuff; hence the whole
daily/monthly thing is a bit misleading.  An additional backup,
especially one that is only slightly different than yesterday's,
costs a small fraction of a percent of the image size.

I may have 3000 daily backups, but I actually store only about 20
times as much information as what is on my live disks right now.

When a disk gets full of backups, I swap it out.  One 2TB disk
will store 300 or so daily images of all my machines.  I rotate
3 disks, sporadically (it used to be daily), so the 2TB disks I
am running now will be full in 2014 or so.  By that time, 5TB
disks will be $100, my accumulated data will be twice as big,
and those disks will last until 2018, when 10TB disks will be
$90 (inflation adjusted current dollars).

The main cost of backups is electricity to run the backup server;
when rsync is running, or dirvish is expiring old images, it
draws 100W more than when it is idle.  Expires also mean more
disk activity and wearout.  The electricity cost, and wearout,
are more important to me than getting another year's usage on
the drive.  So I fill the disks, then park them in the fireproof
for a few months, then move them to safe offsite storage.

Keeping daily backups has additional forensic value - if I
see a suspicious file on my machine (and being ignorant, I
see a LOT of suspicious files) I can look its change history
by looking at every backup of the file.  This is easy to do
by counting hardlinks, which can be used to determine the
exact days it changes (Metadata?  Trust but verify!).  I can
look at other activity (RPM updates, email, other projects
going on) to see what also changed that day.  To date, every
suspicious file has had an innocent explanation, but I haven't
stopped being suspicious.  Someday, I may to find a suspicious
change, perhaps months old, that heralds a real threat, enemy
activity.  I may save not only my butt, but all of yours. 

Or not.

If I was running a much larger operation, I would do more to
optimize this, but it works OK now. 

Keith

P.S. - on disk prices;  they are spiking 3X right now because
of the Thai floods.  That is causing unaffected producers to
step up production.  The producers forced to rebuild factories
will build newer, bleeding-edge factories rather than duplicate
older equipment, so I expect this to accelerate improvements. 
In six months, there will be a glut, and hard drive $/TB will
fall below the pre-flood trendline.  I don't know how big the
fall will be, or if the slope will increase ( $/TB has halving
approximately every year for a LONG time), but a backup strategy
based on ever-cheaper storage has worked for a decade so far.  
When I switched from tape (anybody remember that stuff?), my
backup drives were 80GB and cost a lot more than scarce 2TB
drives cost now.  Then as now, the main cost of backups is
paying attention to them.

-- 
Keith Lofstrom          [email protected]         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to