Hi Craig,

If you'd like a better understanding of deduplication, I very much recommend:
http://www.tarsnap.com/deduplication-explanation.html
(new page from 6 weeks ago, so regular tarsnap users probably haven't seen it)

I made up an example of "wordification" (where we make multiple backups of a
short phrase that changes slightly) as an analogy to backing up a hard drive.
The whole point of that page is to clarify questions like yours, so please let
me know if anything isn't clear on that page!  :)


One specific thing to correct:
> Now, even though (technically speaking) FILE.EXT was not in the newest
> archive because it has not changed since the initial back-up,

FILE.EXT is definitely part of that archive!  You can test this for yourself by
listing all of its filenames with:

    tarsnap -t -f NEWEST-ARCHIVE

I very much agree with Jamie's email where he says "it's far more simple than
you think".  Let's pretend that you make a daily backup of one directory.  Then
each backup is "what did that directory contain on that day".

- Want to see it from 3 days ago?  Restore that archive.
- Want to see it from yesterday?  Restore that archive.
- Do you care about the contents from 2 days ago?  No?  Ok, delete that
  archive.

Your ability to see the version from 1 or 3 days ago is completely unaffected
by your deleting the 2-day-old version.

Cheers,
- Graham


On Sat, Nov 24, 2018 at 03:57:03AM -0800, Craig Hartnett wrote:
> Hi Niels,
> 
> Thanks for your reply. Yes, one of my questions -- about intentionally
> deleting a file and then wanting it gone from all back-ups too -- was
> hypothetical, but the knowledge of how to accomplish that (if necessary)
> is (of course) useful, both to more fully understand Tarsnap and (in
> case the need ever arises) to actually do it.
> 
> And since this is just a laptop, RAID is not a realistic option. An
> interesting one, yes, but not practical.
> 
> Further to my original email, in another thread I tried to restore a
> file that has not changed since I did my initial back-up, but I
> specified the most recent archive:
> 
>         tarsnap -x -f NEWEST-ARCHIVE media/USER/PATH/DIRECTORY/FILE.EXT
> 
> Now, even though (technically speaking) FILE.EXT was not in the newest
> archive because it has not changed since the initial back-up, the
> restore command still worked. I assume Tarsnap is just smart enough to
> know that I'm stupid and specified the "wrong" archive, and got the file
> for me from wherever it was residing. But I assume, going back to one of
> my original questions, that if I had deleted the initial archive, that
> file would not have been there for Tarsnap to find until after my next
> scheduled back-up.
> 
> 
> Craig
> 
> 
> 
> On Sat, 2018-11-24 at 08:05 +0100, Niels Kobschaetzki wrote:
> > That is one of the idea of backups: protect you from accidentally deleting 
> > files (they protect you also from hardware failure but redundancy and RAIDs 
> > a re a better choice here because of possibility to continue the device 
> > during the “outage”) 
> > Thus if you truly want to have a file deleted you need it to delete also 
> > from the backups. Most backup systems in my experience only know the 
> > concept of volumes which need to be deleted then. Thus a file is only gone 
> > when all volumes are gone that contain the file. Thus in that case you have 
> > to wait until the file is rotated away or destroy all the volumes. 
> > Rotation has the added benefit of saving on space which means in the end 
> > saving money (with any backup system because with other system you will 
> > need more drives, tapes whatever with time). 
> > 
> > Niels
> > 
> > 
> > > On 24. Nov 2018, at 04:04, Craig Hartnett <cr...@1811.spamslip.com> wrote:
> > > 
> > > Hi again,
> > > 
> > > OK, so I did read that I'm supposed to forget everything I know about
> > > back-ups, but frankly that wasn't much. :) Not that I know nothing, but
> > > it hasn't been something I've spent a *lot* of time thinking about.
> > > 
> > > But as I think about Tarsnap, deleted files, rotating/deleting archives,
> > > daily storage charges (increasing, of course, as the amount of data
> > > stored slowly increases), etc., I start wondering about what happens to
> > > files I intentionally delete from my hard drive. If I understand Tarsnap
> > > correctly, a file that I backed up in my initial back-up and that hasn't
> > > since changed only exists in that initial back-up archive because (a) it
> > > hasn't changed so there has been no need to re-upload any part of it and
> > > (b) archives are immutable. If I delete that initial archive I assume (I
> > > could be wrong, so this is part of my series of questions) that Tarsnap
> > > will realise that and back up those files again. Am I right?
> > > 
> > > So if I delete my initial archive today, Tarsnap will realise that it
> > > has to upload pretty much everything -- not everything, but almost
> > > everything -- again, right?
> > > 
> > > And what if I delete a file -- any file -- on my hard drive that has
> > > been backed up in the past? Of course Tarsnap won't upload a null file,
> > > but does that file continue to exist in the archives unless or until I
> > > delete the last archive that contains it? In other words, it's *my*
> > > responsibility to curate my archives, right? (I'm quite happy to curate
> > > my own stuff. Just want to make sure.)
> > > 
> > > And what if I want to delete a file from my hard drive *and* my
> > > back-ups? Since the archives are immutable, and this file was in my
> > > initial back-up, am I right that there is no way to delete that single
> > > file from the back-up archives without deleting the whole archive, and
> > > consequently re-uploading most of the original archive again?
> > > 
> > > Which leads me to the conclusion that I should pick a time frame -- say,
> > > 90 days -- or come up with some traditional, staggered rotation system,
> > > and start deleting archives older than that *except* the initial
> > > archive, right?
> > > 
> > > Or am I completely out to lunch here? :)
> > > 
> > > Thanks for any light you can shed on this, via links to documentation
> > > that covers it of course if I have missed it.
> > > 
> > > 
> > > Craig
> 
> 
> 
> 

Reply via email to