Colin Percival <[email protected]> wrote: > Yes, known issue as of about a year ago; as far as I know you're only the > second person to trip over this. > > It's an awkward problem relating to the way the tar format works: Because tar > is a streaming format, when we see data for the first time there's no way to > know if that is hardlinked to a file which we will want to extract later -- > and when we come to the hardlink we want to extract later, trying to "rewind" > the tape is problematic. (Normal tar utilities run into the same problems, > incidentally.) > > Right now I'm looking at two ways of attacking this: > 1. Include data in every archive entry, including hard links -- this would > make archives larger, but tarsnap's deduplication should make that mostly > irrelevant. > 2. Make a note of hardlinks where we didn't extract the first copy of the > data, and then add a second pass through the archive to recover those -- this > would keep archives the same size, but is considerably more complicated and > potentially bug-prone due to edge cases like extracting files into directories > which are being created with read-only permissions. > > If anyone has comments on these options or suggestions for other approaches, > please comment on the github issue I've opened for tracking this: > https://github.com/Tarsnap/tarsnap/issues/18
Thanks for the explanation, which makes sense! I'll head on over to github to comment - I didn't even realise tarsnap was on there! Cheers, Jamie
