I've tried using deduplication, but only get about 6gb savings per 30gb.

I intend on using squashfs on top of rdiff-backup, btrfs is just being
used temporarily.

On Thu, 22 Apr 2021 at 16:41, Dominic Raferd <domi...@timedicer.co.uk> wrote:
>
> On 22/04/2021 07:03, griffin tucker wrote:
> > i have a collection of the last 5 monthly dumps of various wikis from
> > dumps.wikimedia.org
> >
> > each dump has numbered directories in the format 20210501, 20210401,
> > 20210301, etc.
> >
> > all the filenames in these directories remain the same with each
> > wiki's dump, with the exception of enwiki
> >
> > other than enwiki, these range from about 30gb to about 370gb
> > uncompressed with each successive dump
> >
> > enwiki, the main english wikipedia, has mostly the same named files,
> > but has the pages-meta-history.xml file split up into various 1-55gb
> > compressed files (mostly 1-2gb) making a total of about 700gb
> > compressed (disregarding redundant files)
> >
> > i'm not sure how big enwiki is uncompressed, but could be close to
> > 25tb. i haven't figured out how i could make rdiff-backup more
> > efficient with these files, aside from a script to merge each
> > metahistory file into a single huge >100gb file and then running
> > rdiff-backup, and then splitting the file back into their separate
> > files with an index after restoring
> >
> > i'm using btrfs zstd:15 to store the files uncompressed, however i
> > don't have enough storage to store enwiki uncompressed, zstd
> > compression just isn't that good, even at maximum - i've used xz
> > compression which attains much better rates of compression for other
> > wikis but that isn't exactly seamless (experiments with fuse failed)
> >
> > so, to save space, i thought i would use rdiff-backup so that it would
> > only store the differences between dumps, and it works very well in
> > initial tests, however, if i run the reverse incremental backups one
> > after the other today, they would be dated today, rather than
> > 20210501, 20210401, etc. which isn't informative
> >
> > if i could add a comment next to each datetime stamp, this would be
> > useful, otherwise i'll have to keep a separate index, which isn't a
> > huge problem, i just thought i'd ask if i could change the datetime
> > stamps before i write such a script
> >
> > On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <e...@lavar.de> wrote:
> >> Hi Griffin,
> >>
> >> On 22/04/2021 06:39, griffin tucker wrote:
> >>> is there a way to change the timestamps of the backups?
> >> no
> >>
> >>> or perhaps replace the timestamps with a unique name?
> >> no
> >>
> >>> would this cause a faulty restore or a damaged backup?
> >> yes, rdiff-backup makes a lot of date/time comparaisons so the timestamp
> >> is meaningful.
> >>
> >> What are you trying to do?
> >>
> >> KR, Eric
> Since you are already using btrfs, have you considered using
> deduplication? Likely to work better if you store uncompressed.
>

Reply via email to