Hi Colin, On Sat, Apr 08, 2017 at 07:52:54PM -0700, Colin Percival wrote: > > If not, then I am planning to use an us-east-1 EC2 instance so that at > > least the Tarsnap server<->client bandwidth is in one place. I can then > > use that machine to deduplicate and then the download to my machine here > > can at least be efficient. In this case, will I still end up being > > billed by Tarsnap for the "Compressed size/All archives" figure? > > If you extract all of the archives, yes. > > How are you planning on storing your data after you extract all of the > archives? Something like ZFS which provides filesystem level deduplication?
I intend to use my own tool, ddar[1]. It deduplicates at userspace level. This particular set of archives will never change again, and I won't ever need to add to it again. So I'd like to take it offline so that storage is cheaper. I use git-annex to keep multiple copies of static data, including off-site, as needed. Robie [1] https://github.com/basak/ddar