On Thu, 22 Apr 2021 at 17:17, Dominic Raferd <domi...@timedicer.co.uk> wrote: > > > On 22/04/2021 08:07, Dominic Raferd wrote: > > On 22/04/2021 08:01, griffin tucker wrote: > >> I've tried using deduplication, but only get about 6gb savings per 30gb. > >> > >> I intend on using squashfs on top of rdiff-backup, btrfs is just being > >> used temporarily. > >> > >> On Thu, 22 Apr 2021 at 16:41, Dominic Raferd > >> <domi...@timedicer.co.uk> wrote: > >>> On 22/04/2021 07:03, griffin tucker wrote: > >>>> i have a collection of the last 5 monthly dumps of various wikis from > >>>> dumps.wikimedia.org > >>>> > >>>> each dump has numbered directories in the format 20210501, 20210401, > >>>> 20210301, etc. > >>>> > >>>> all the filenames in these directories remain the same with each > >>>> wiki's dump, with the exception of enwiki > >>>> > >>>> other than enwiki, these range from about 30gb to about 370gb > >>>> uncompressed with each successive dump > >>>> > >>>> enwiki, the main english wikipedia, has mostly the same named files, > >>>> but has the pages-meta-history.xml file split up into various 1-55gb > >>>> compressed files (mostly 1-2gb) making a total of about 700gb > >>>> compressed (disregarding redundant files) > >>>> > >>>> i'm not sure how big enwiki is uncompressed, but could be close to > >>>> 25tb. i haven't figured out how i could make rdiff-backup more > >>>> efficient with these files, aside from a script to merge each > >>>> metahistory file into a single huge >100gb file and then running > >>>> rdiff-backup, and then splitting the file back into their separate > >>>> files with an index after restoring > >>>> > >>>> i'm using btrfs zstd:15 to store the files uncompressed, however i > >>>> don't have enough storage to store enwiki uncompressed, zstd > >>>> compression just isn't that good, even at maximum - i've used xz > >>>> compression which attains much better rates of compression for other > >>>> wikis but that isn't exactly seamless (experiments with fuse failed) > >>>> > >>>> so, to save space, i thought i would use rdiff-backup so that it would > >>>> only store the differences between dumps, and it works very well in > >>>> initial tests, however, if i run the reverse incremental backups one > >>>> after the other today, they would be dated today, rather than > >>>> 20210501, 20210401, etc. which isn't informative > >>>> > >>>> if i could add a comment next to each datetime stamp, this would be > >>>> useful, otherwise i'll have to keep a separate index, which isn't a > >>>> huge problem, i just thought i'd ask if i could change the datetime > >>>> stamps before i write such a script > >>>> > >>>> On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <e...@lavar.de> wrote: > >>>>> Hi Griffin, > >>>>> > >>>>> On 22/04/2021 06:39, griffin tucker wrote: > >>>>>> is there a way to change the timestamps of the backups? > >>>>> no > >>>>> > >>>>>> or perhaps replace the timestamps with a unique name? > >>>>> no > >>>>> > >>>>>> would this cause a faulty restore or a damaged backup? > >>>>> yes, rdiff-backup makes a lot of date/time comparaisons so the > >>>>> timestamp > >>>>> is meaningful. > >>>>> > >>>>> What are you trying to do? > >>>>> > >>>>> KR, Eric > >>> Since you are already using btrfs, have you considered using > >>> deduplication? Likely to work better if you store uncompressed. > >>> > > In your scenario I would expect deduplication to give big savings if > > you store uncompressed. If not, YMMV. (I tried with rdiff-backup on > > btrfs + deduplication a few years ago but found it all a bit scary and > > retreated to ext4.) > To clarify, I mean turning off compression within rdiff-backup, and > instead using compression (+deduplication) at fs level. well, i suppose i was using windows server's dedupe in that 6gb per 30gb savings, maybe i should try again with btrfs' dedupe
come to think of it, dedupe seems to be already enabled which would explain <5 second copies for hundreds of gigabytes, but i can't get the dedupe status when i run: btrfs dedupe status <mountpoint> with an error btrfs: unknown token 'dedupe' i'll investiage this further