> On Fri, 18 Nov 2011, Robert Munro wrote: > > >And weekly backups to a different place. If you don't stop your backups > >right away upon realizing that something has been corrupted, that will > >just propagate to your daily backups, rendering them useless to recover. > On Fri, Nov 18, 2011 at 08:16:41PM -0800, Rich Shepard wrote: > Dirvish runs every night and records the changes to each file. While Keith > keeps his backups forever, I keep daily backups for a month and the weekly > backup (Sunday) for 6 months. In 20 years I've never needed a backup from an > earlier time.
Robert may not know that dirvish/rsync makes huge nests of really cheap hardlinks to apparently identical stuff; hence the whole daily/monthly thing is a bit misleading. An additional backup, especially one that is only slightly different than yesterday's, costs a small fraction of a percent of the image size. I may have 3000 daily backups, but I actually store only about 20 times as much information as what is on my live disks right now. When a disk gets full of backups, I swap it out. One 2TB disk will store 300 or so daily images of all my machines. I rotate 3 disks, sporadically (it used to be daily), so the 2TB disks I am running now will be full in 2014 or so. By that time, 5TB disks will be $100, my accumulated data will be twice as big, and those disks will last until 2018, when 10TB disks will be $90 (inflation adjusted current dollars). The main cost of backups is electricity to run the backup server; when rsync is running, or dirvish is expiring old images, it draws 100W more than when it is idle. Expires also mean more disk activity and wearout. The electricity cost, and wearout, are more important to me than getting another year's usage on the drive. So I fill the disks, then park them in the fireproof for a few months, then move them to safe offsite storage. Keeping daily backups has additional forensic value - if I see a suspicious file on my machine (and being ignorant, I see a LOT of suspicious files) I can look its change history by looking at every backup of the file. This is easy to do by counting hardlinks, which can be used to determine the exact days it changes (Metadata? Trust but verify!). I can look at other activity (RPM updates, email, other projects going on) to see what also changed that day. To date, every suspicious file has had an innocent explanation, but I haven't stopped being suspicious. Someday, I may to find a suspicious change, perhaps months old, that heralds a real threat, enemy activity. I may save not only my butt, but all of yours. Or not. If I was running a much larger operation, I would do more to optimize this, but it works OK now. Keith P.S. - on disk prices; they are spiking 3X right now because of the Thai floods. That is causing unaffected producers to step up production. The producers forced to rebuild factories will build newer, bleeding-edge factories rather than duplicate older equipment, so I expect this to accelerate improvements. In six months, there will be a glut, and hard drive $/TB will fall below the pre-flood trendline. I don't know how big the fall will be, or if the slope will increase ( $/TB has halving approximately every year for a LONG time), but a backup strategy based on ever-cheaper storage has worked for a decade so far. When I switched from tape (anybody remember that stuff?), my backup drives were 80GB and cost a lot more than scarce 2TB drives cost now. Then as now, the main cost of backups is paying attention to them. -- Keith Lofstrom [email protected] Voice (503)-520-1993 KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon" Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
