On 21/12/2025 04:07, Colin Percival wrote:
On 12/20/25 15:27, [email protected] wrote:
I used the `tarsnap --print-stats -f '*'` command from the docs but
the output confuses me: the 'All archives (unique data)' line lists
the compressed size as ~140GB, and this tallies with my monthly
spend. But the sum of all the individual archives' compressed unique
data only comes to ~41GB, which I wasn't expecting.
"Unique data" means "how much data is in this archive *and not any
others*".
Or from a different perspective: It tells you how much data will be
removed
if you delete that *one* archive.
If you have two identical archives, they'll both show very close to zero
"unique data" (just a very small amount of non-deduplicated metadata).
So, of your ~140 GB of data, ~40 GB is blocks which are present in
only one
archive and the other ~100 GB is blocks which appear in multiple
archives.
Some of those blocks might appear in two archives; some might be found in
every single one of your archives. (Tarsnap does actually know for each
block of data how many archives use it -- it needs this reference
count in
order to know when it can be deleted -- but there's no interface to that
information.)
Thank you, that makes sense -- I'd come to the same conclusion having
slept on it! I think my previous mental model of unique data was
'originally unique at time of upload' with the other archives then being
incremental, but that number wouldn't actually help me in figuring out
which were the most beneficial archives to offload.
So in terms of outright minimisation, my best strategy is removing
archives that have the highest unique data. To figure out anything more
complicated in terms of storing what's most valuable to me, I'll maybe
have to get creative with dry run options....
I don't know if this has come up before, but the most valuable version
of tarsnap for me would be a local one, i.e. ability to use an external
drive/remote fs of my choice as the store. I can't fault the software
itself and the encryption and deduplication aspects are excellent, but
the backup/restore process itself is so slow (I wouldn't have predicted
'days' on a fibre connection!) and comparatively expensive that the
total cost for the holistic quality of service ends up pretty high.
I would definitely pay to have the former without the latter!
Thanks again,
Tara