On 08/22/15 15:42, Garance AE Drosehn wrote: > Total size Compressed size > This archive 20_468320 5_974899 > New data 109878 7560 > > The thing is that I'm pretty sure I haven't changed anything in that > folder (although it is possible I did).
It's impossible to say for certain based on these statistics alone, but the very high compression ratio for the new data makes me think that it's probably mostly tar headers (which inherently have lots of zero bytes in their sparse format). Archiving files from a different location (such that the relative paths are different), having an updated mtime on one of the files... there's lots of things which could cause a few bytes of tar headers to change. Considering that you're only looking at 7560 new bytes of compressed data, I wouldn't worry too much at this point. ;-) > But one of the reasons I like the idea of doing dry-runs is to see > if the amount of new data to backup seems "reasonable". I've been > known to download or generate pretty huge files "temporarily", only > to forget about those files for years. And there was one time that > a new version of the Tivoli Storage Manager caused my config file > to be handled differently, and I nearly backed up almost 1 TB of > data which I really did *not* want to be backed up to TSM. FYI, you can also use tarsnap's --maxbw option to tell it to stop archiving when it hits a certain number of bytes of upload. > Is there any way that tarsnap would tell me which files have new > data, at least when I'm doing a dry-run? It would probably be > nice to have two options: (1) a count of files which have new > data, (2) a list of the specific filenames. No, because the concept of "which files have new data" isn't really well-defined; files turn into a stream of tar which gets chopped up into pieces which then get deduplicated, so one new chunk of data could correspond to one file, or several files, or even to no files at all (in the case of a change which only happened in tar headers). > And in a related question: Is there way to do a dry-run and find > out that there is *no* new data to back up? Not right now. Theoretically the archive-creation code could check whether any new blocks were uploaded before it uploads the final non-deduplicated metadata and abort if not, but I'd have to write new code for this. > For some of the > folders that I back up, I do not want to create another backup if > nothing has changed. Why not? -- Colin Percival Security Officer Emeritus, FreeBSD | The power to serve Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
