I've tried using deduplication, but only get about 6gb savings per 30gb. I intend on using squashfs on top of rdiff-backup, btrfs is just being used temporarily.
On Thu, 22 Apr 2021 at 16:41, Dominic Raferd <domi...@timedicer.co.uk> wrote: > > On 22/04/2021 07:03, griffin tucker wrote: > > i have a collection of the last 5 monthly dumps of various wikis from > > dumps.wikimedia.org > > > > each dump has numbered directories in the format 20210501, 20210401, > > 20210301, etc. > > > > all the filenames in these directories remain the same with each > > wiki's dump, with the exception of enwiki > > > > other than enwiki, these range from about 30gb to about 370gb > > uncompressed with each successive dump > > > > enwiki, the main english wikipedia, has mostly the same named files, > > but has the pages-meta-history.xml file split up into various 1-55gb > > compressed files (mostly 1-2gb) making a total of about 700gb > > compressed (disregarding redundant files) > > > > i'm not sure how big enwiki is uncompressed, but could be close to > > 25tb. i haven't figured out how i could make rdiff-backup more > > efficient with these files, aside from a script to merge each > > metahistory file into a single huge >100gb file and then running > > rdiff-backup, and then splitting the file back into their separate > > files with an index after restoring > > > > i'm using btrfs zstd:15 to store the files uncompressed, however i > > don't have enough storage to store enwiki uncompressed, zstd > > compression just isn't that good, even at maximum - i've used xz > > compression which attains much better rates of compression for other > > wikis but that isn't exactly seamless (experiments with fuse failed) > > > > so, to save space, i thought i would use rdiff-backup so that it would > > only store the differences between dumps, and it works very well in > > initial tests, however, if i run the reverse incremental backups one > > after the other today, they would be dated today, rather than > > 20210501, 20210401, etc. which isn't informative > > > > if i could add a comment next to each datetime stamp, this would be > > useful, otherwise i'll have to keep a separate index, which isn't a > > huge problem, i just thought i'd ask if i could change the datetime > > stamps before i write such a script > > > > On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <e...@lavar.de> wrote: > >> Hi Griffin, > >> > >> On 22/04/2021 06:39, griffin tucker wrote: > >>> is there a way to change the timestamps of the backups? > >> no > >> > >>> or perhaps replace the timestamps with a unique name? > >> no > >> > >>> would this cause a faulty restore or a damaged backup? > >> yes, rdiff-backup makes a lot of date/time comparaisons so the timestamp > >> is meaningful. > >> > >> What are you trying to do? > >> > >> KR, Eric > Since you are already using btrfs, have you considered using > deduplication? Likely to work better if you store uncompressed. >