Craig Hartnett <cr...@1811.spamslip.com> wrote: > Hi again, > > So if I delete my initial archive today, Tarsnap will realise that it > has to upload pretty much everything -- not everything, but almost > everything -- again, right? > > And what if I delete a file -- any file -- on my hard drive that has > been backed up in the past? Of course Tarsnap won't upload a null file, > but does that file continue to exist in the archives unless or until I > delete the last archive that contains it? In other words, it's *my* > responsibility to curate my archives, right? (I'm quite happy to curate > my own stuff. Just want to make sure.)
It's your duty to prune your old archives, yes, but the rest of your assumptions are wrong. It's actually far more simple than you are making it (whilst at the same time, being far more complex!) What I mean, is that from a coding point of view, the way files are actually stored is very efficient, with encryption, compression and de-duplication on a block level rather than a file level. However, you can ignore all of this complex black magic: Just consider every backup you make to be a complete and independent full backup. They are not incremental backups! (the black magic only requires literal incremental updates, and duplicate archived files aren't stored server-side, but again, leave that to the black magic!) So, every backup is seen to you as a complete and independent full archive. So, you'll aready now know the answers to your questions, but to confirm: Id you delete your first archive, it won't be much different from deleting your second or your third archive. All data which only existed in the original file is deleted from the server, all the rest remains. And if you delete a file client-side, then future backups won't reference it at all (in other words, yes, no null files will be uploaded... remember, think of your next backup as an independent full backup), but the data will still exist server inside, inside every other backup that file stil exists within (again, black magic, the file data doesn't literally exist in duplication, but from your view it does) I.E. If you remove a file, its data will only be removed server-side once you've deleted all prior backups that refer to that file in its specific incarnation. > And what if I want to delete a file from my hard drive *and* my > back-ups? Since the archives are immutable, and this file was in my > initial back-up, am I right that there is no way to delete that single > file from the back-up archives without deleting the whole archive, and > consequently re-uploading most of the original archive again? You delete the whole archive, and any other archive it exists in, yes. BUT, black magic deals with the rest - no re-uploading will be done. tarsnap intelligently keeps the bits still needed. In other words, deleting the whole archive won't make any difference to time/data of subsequent backups. > Which leads me to the conclusion that I should pick a time frame -- say, > 90 days -- or come up with some traditional, staggered rotation system, > and start deleting archives older than that *except* the initial > archive, right? You'll know the answer by now.... :-) Yep, you'll probably want to have some sort of traditional deletion mechanism of old archives, but first, check this out: # tarsnap --print-stats --humanize-numbers -f '*' (It takes some time to run...) This will show you how much data each archive is using individually. I.E. How much space will be literally freed up if you delete said archive. Unless you update files manically, you'll probably be surprised how little space each one takes up. So, bearing those stats in mind, choose an appropriate deletion schedule. Personally, I don't bother. I just do it manually now and then... Maybe for recent backups deleting all but the first of the day, for older ones, all but the first of the week, for older still, all but the first of the month etc. But due to the efficient storage, I sometimes have years worths/hundreds of backups stored at a time until I get around to pruning. What I will do manually and quickly is demonstrated by the following example: I recently downloaded a zip file of many many Gb in size. tarsnap duly backed it up (of course, I could have set it not to, but it was important) About a week or so later, I was still too busy to process it, but I tried unzipping it and archiving it as an xzipped tar file. This took up far less space, but due to the fact both old and new files are compressed, there is nothing in common between the two that tarsnap can hold on to.. So I deleted all the archives from the previous week that contained the old zip file, and let the new .tar.xz file be uploaded in its place - cursing myself for not doing this before the file was ever backed up, saving the upload pennies) This leads me into another point: You don't have to dump all your data as one archive. You may want to backup the system stuff seperately from the user stuff. Then, for instance, if you completly upgrade your system and are happy, you can zap all the old system backups staight away, whilst still leaving user backups around a bit longer, in case you suddenly realise that document you deleted a month ago is still required! In other words, you can make a customised rotation scheme not just per machine, but also per data-set, keeping some longer than others. As for needing to keep your original archive, you know the answer now...: "Nope, delete the initial archive just like any other! It's nothing special!" > Or am I completely out to lunch here? :) I'm afraid so, but I am most of the time, and anyway, lunch is more enjoyable than computers! > Thanks for any light you can shed on this, via links to documentation > that covers it of course if I have missed it. All of the black magic stuff and other stuff is on the www.tarsnap.com website... But beware, witches and goblins! Cheers! Jamie