On Thu, Sep 21, 2017 at 06:58:43PM -0000, [email protected] wrote: > Sorry for the delay. Here's the file type distribution (some are unknown; > I just shared as powershell printed):
Thanks! This looks quite reasonable to me. Just to keep the data together, in another email you wrote: > I am backing up 1.45 GB and I see the usage are 1.07 GB. Disregarding the blank extensions in your list, and only looking at extensions which are more than 50 MB, we have: > Extension Size (MB) Count > --------- --------- ----- > jpg 391.85 4653 > pdf 180.03 456 > tcx 108.9 146 > mp4 80.2 7 > dcm 63.26 56 jpg files are typically compressed. This can be disabled in most image programs, but I'm willing to bet that these are compressed. Tarsnap's deduplication will be useful when you make additional archives, but Tarsnap's compression stage will not save you a lot of space on those jpgs. Same goes for mp4 and pdf -- they're already compressed. I'm not certain about "tcx" and "dcm" files. Googling suggests that the latter could be DICOM images (which include compression) or "DCM audio module" (I'm not certain about the compression status there). "tcx" could be TurboCAD in text form or "Training Center XML", both of which *will* benefit from Tarsnap's compression. Assuming that the above guesses are plausible, and only looking at the small table of file extensions, there's 823 Mb total data, of which we could expect to see significant reduction in 171 Mb of that (due to compression). This value depends a huge amount on the actual content, so people are quite reluctant to say anything like "assuming typical user data, DEFLATE will give you a reduction of xy%". That said, let's assume that we reduce 75% of that 171 MB. That saves us 128 MB. You saw a reduction of 1.45 GB to to 1.07 GB, or a saving of 380 MB. I was looking at roughly half of your data, so let's double the estimated saving and get 250 MB. So you're seeing more compression than we expected. There's a huge amount of quibble room here. Deduplication will probably have /some/ kind of benefit to your initial data. Your jpg files were probably compressed by a further 0.2% or 0.5% by Tarsnap. Your data in file extensions less than 50 MB might be easier to compress. And the figure of "75%" was a complete guess on my part. Still, in terms of a "back-of-the-napkin analysis", the answer is "yes, your reduction of 1.45 GB to 1.07 GB for the initial archive, on the data you provided, looks quite reasonable". Cheers, - Graham
