On 22/04/2021 07:03, griffin tucker wrote:
i have a collection of the last 5 monthly dumps of various wikis from
dumps.wikimedia.org

each dump has numbered directories in the format 20210501, 20210401,
20210301, etc.

all the filenames in these directories remain the same with each
wiki's dump, with the exception of enwiki

other than enwiki, these range from about 30gb to about 370gb
uncompressed with each successive dump

enwiki, the main english wikipedia, has mostly the same named files,
but has the pages-meta-history.xml file split up into various 1-55gb
compressed files (mostly 1-2gb) making a total of about 700gb
compressed (disregarding redundant files)

i'm not sure how big enwiki is uncompressed, but could be close to
25tb. i haven't figured out how i could make rdiff-backup more
efficient with these files, aside from a script to merge each
metahistory file into a single huge >100gb file and then running
rdiff-backup, and then splitting the file back into their separate
files with an index after restoring

i'm using btrfs zstd:15 to store the files uncompressed, however i
don't have enough storage to store enwiki uncompressed, zstd
compression just isn't that good, even at maximum - i've used xz
compression which attains much better rates of compression for other
wikis but that isn't exactly seamless (experiments with fuse failed)

so, to save space, i thought i would use rdiff-backup so that it would
only store the differences between dumps, and it works very well in
initial tests, however, if i run the reverse incremental backups one
after the other today, they would be dated today, rather than
20210501, 20210401, etc. which isn't informative

if i could add a comment next to each datetime stamp, this would be
useful, otherwise i'll have to keep a separate index, which isn't a
huge problem, i just thought i'd ask if i could change the datetime
stamps before i write such a script

On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <e...@lavar.de> wrote:
Hi Griffin,

On 22/04/2021 06:39, griffin tucker wrote:
is there a way to change the timestamps of the backups?
no

or perhaps replace the timestamps with a unique name?
no

would this cause a faulty restore or a damaged backup?
yes, rdiff-backup makes a lot of date/time comparaisons so the timestamp
is meaningful.

What are you trying to do?

KR, Eric
Since you are already using btrfs, have you considered using deduplication? Likely to work better if you store uncompressed.

Reply via email to