my responses inline: On Thu, 22 Apr 2021 at 22:05, Patrik Dufresne <pat...@ikus-soft.com> wrote: > > Hello griffin, > > I think rdiff-backup could be a good fit for you. > > 1. If you want rdiff-backup to store increments efficiently, make sure your > data is not compressed. Compression is messing a lot with the files and > doesn't make increment very efficient. yep, definitely uncompressed (but btrfs does filesystem compression)
> 2. If you are using ZFS you may configure the compression type you want for a > particular data set. gzip, LZ4. You may probably do something similar with > BTRFS yep, i can confirm btrfs can do this, but not quite as elegantly > 3. I'm wondering what the "dump" file format is. If it's a single file. It's > not optimal for rdiff-backup since increment will get computed on this big > file every day. Ideally, rdiff-backup works well will more smaller files > cause it to detect if the file changes or not very quickly and simply skip > the increments. they're .xml and .sql files, so they're mostly text, and i understand it's slow with large files (some of them are 20gb+) however, it's changed from 360gb down to 60gb for 5+ dumps and this is worth the wait. there's about 800 files in each dump, and half of them are <1kb. in each rdiff-backup location, the latest backup doesn't seem to be compressed at all, which is why i'll probably use squashfs on top of rdiff-backup > 4. Finally, if you want to force a particular timestamp to match your dump > file numbering, you may enforce a date when running the backup. Take a look > at `--current-time` This way you could mimic the fact the backup is running > in the past or future according to your need. i must have skipped over that in the man page, thanks! just what i needed! > > > > > On Thu, Apr 22, 2021 at 3:45 AM griffin tucker > <rdiffabkcuapbup9...@griffintucker.id.au> wrote: >> >> On Thu, 22 Apr 2021 at 17:38, Dominic Raferd <domi...@timedicer.co.uk> wrote: >> > >> > >> > On 22/04/2021 08:31, griffin tucker wrote: >> > > On Thu, 22 Apr 2021 at 17:17, Dominic Raferd <domi...@timedicer.co.uk> >> > > wrote: >> > >> >> > >> On 22/04/2021 08:07, Dominic Raferd wrote: >> > >>> On 22/04/2021 08:01, griffin tucker wrote: >> > >>>> I've tried using deduplication, but only get about 6gb savings per >> > >>>> 30gb. >> > >>>> >> > >>>> I intend on using squashfs on top of rdiff-backup, btrfs is just being >> > >>>> used temporarily. >> > >>>> >> > >>>> On Thu, 22 Apr 2021 at 16:41, Dominic Raferd >> > >>>> <domi...@timedicer.co.uk> wrote: >> > >>>>> On 22/04/2021 07:03, griffin tucker wrote: >> > >>>>>> i have a collection of the last 5 monthly dumps of various wikis >> > >>>>>> from >> > >>>>>> dumps.wikimedia.org >> > >>>>>> >> > >>>>>> each dump has numbered directories in the format 20210501, 20210401, >> > >>>>>> 20210301, etc. >> > >>>>>> >> > >>>>>> all the filenames in these directories remain the same with each >> > >>>>>> wiki's dump, with the exception of enwiki >> > >>>>>> >> > >>>>>> other than enwiki, these range from about 30gb to about 370gb >> > >>>>>> uncompressed with each successive dump >> > >>>>>> >> > >>>>>> enwiki, the main english wikipedia, has mostly the same named files, >> > >>>>>> but has the pages-meta-history.xml file split up into various 1-55gb >> > >>>>>> compressed files (mostly 1-2gb) making a total of about 700gb >> > >>>>>> compressed (disregarding redundant files) >> > >>>>>> >> > >>>>>> i'm not sure how big enwiki is uncompressed, but could be close to >> > >>>>>> 25tb. i haven't figured out how i could make rdiff-backup more >> > >>>>>> efficient with these files, aside from a script to merge each >> > >>>>>> metahistory file into a single huge >100gb file and then running >> > >>>>>> rdiff-backup, and then splitting the file back into their separate >> > >>>>>> files with an index after restoring >> > >>>>>> >> > >>>>>> i'm using btrfs zstd:15 to store the files uncompressed, however i >> > >>>>>> don't have enough storage to store enwiki uncompressed, zstd >> > >>>>>> compression just isn't that good, even at maximum - i've used xz >> > >>>>>> compression which attains much better rates of compression for other >> > >>>>>> wikis but that isn't exactly seamless (experiments with fuse failed) >> > >>>>>> >> > >>>>>> so, to save space, i thought i would use rdiff-backup so that it >> > >>>>>> would >> > >>>>>> only store the differences between dumps, and it works very well in >> > >>>>>> initial tests, however, if i run the reverse incremental backups one >> > >>>>>> after the other today, they would be dated today, rather than >> > >>>>>> 20210501, 20210401, etc. which isn't informative >> > >>>>>> >> > >>>>>> if i could add a comment next to each datetime stamp, this would be >> > >>>>>> useful, otherwise i'll have to keep a separate index, which isn't a >> > >>>>>> huge problem, i just thought i'd ask if i could change the datetime >> > >>>>>> stamps before i write such a script >> > >>>>>> >> > >>>>>> On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <e...@lavar.de> wrote: >> > >>>>>>> Hi Griffin, >> > >>>>>>> >> > >>>>>>> On 22/04/2021 06:39, griffin tucker wrote: >> > >>>>>>>> is there a way to change the timestamps of the backups? >> > >>>>>>> no >> > >>>>>>> >> > >>>>>>>> or perhaps replace the timestamps with a unique name? >> > >>>>>>> no >> > >>>>>>> >> > >>>>>>>> would this cause a faulty restore or a damaged backup? >> > >>>>>>> yes, rdiff-backup makes a lot of date/time comparaisons so the >> > >>>>>>> timestamp >> > >>>>>>> is meaningful. >> > >>>>>>> >> > >>>>>>> What are you trying to do? >> > >>>>>>> >> > >>>>>>> KR, Eric >> > >>>>> Since you are already using btrfs, have you considered using >> > >>>>> deduplication? Likely to work better if you store uncompressed. >> > >>>>> >> > >>> In your scenario I would expect deduplication to give big savings if >> > >>> you store uncompressed. If not, YMMV. (I tried with rdiff-backup on >> > >>> btrfs + deduplication a few years ago but found it all a bit scary and >> > >>> retreated to ext4.) >> > >> To clarify, I mean turning off compression within rdiff-backup, and >> > >> instead using compression (+deduplication) at fs level. >> > > well, i suppose i was using windows server's dedupe in that 6gb per >> > > 30gb savings, maybe i should try again with btrfs' dedupe >> > > >> > > come to think of it, dedupe seems to be already enabled which would >> > > explain <5 second copies for hundreds of gigabytes, but i can't get >> > > the dedupe status when i run: >> > > >> > > btrfs dedupe status <mountpoint> >> > > >> > > with an error >> > > >> > > btrfs: unknown token 'dedupe' >> > > >> > > i'll investiage this further >> > Another option is to use ZFS, Patrik wrote about it here: >> > https://www.ikus-soft.com/en/blog/2020-07-22-configure-zfs-for-rdiff-backup/ >> i'm reluctant to use zfs because linus torvalds said not to >> > > > -- > IKUS Software inc. > https://www.ikus-soft.com/ > 514-971-6442 > 130 rue Doris > St-Colomban, QC J5K 1T9