my responses inline:
On Thu, 22 Apr 2021 at 22:05, Patrik Dufresne <pat...@ikus-soft.com> wrote:
>
> Hello griffin,
>
> I think rdiff-backup could be a good fit for you.
>
> 1. If you want rdiff-backup to store increments efficiently, make sure your 
> data is not compressed. Compression is messing a lot with the files and 
> doesn't make increment very efficient.
yep, definitely uncompressed (but btrfs does filesystem compression)

> 2. If you are using ZFS you may configure the compression type you want for a 
> particular data set. gzip, LZ4. You may probably do something similar with 
> BTRFS
yep, i can confirm btrfs can do this, but not quite as elegantly

> 3. I'm wondering what the "dump" file format is. If it's a single file. It's 
> not optimal for rdiff-backup since increment will get computed on this big 
> file every day. Ideally, rdiff-backup works well will more smaller files 
> cause it to detect if the file changes or not very quickly and simply skip 
> the increments.
they're .xml and .sql files, so they're mostly text, and i understand
it's slow with large files (some of them are 20gb+) however, it's
changed from 360gb down to 60gb for 5+ dumps and this is worth the
wait.  there's about 800 files in each dump, and half of them are
<1kb. in each rdiff-backup location, the latest backup doesn't seem to
be compressed at all, which is why i'll probably use squashfs on top
of rdiff-backup

> 4. Finally, if you want to force a particular timestamp to match your dump 
> file numbering, you may enforce a date when running the backup. Take a look 
> at `--current-time` This way you could mimic the fact the backup is running 
> in the past or future according to your need.
i must have skipped over that in the man page, thanks! just what i needed!

>
>
>
>
> On Thu, Apr 22, 2021 at 3:45 AM griffin tucker 
> <rdiffabkcuapbup9...@griffintucker.id.au> wrote:
>>
>> On Thu, 22 Apr 2021 at 17:38, Dominic Raferd <domi...@timedicer.co.uk> wrote:
>> >
>> >
>> > On 22/04/2021 08:31, griffin tucker wrote:
>> > > On Thu, 22 Apr 2021 at 17:17, Dominic Raferd <domi...@timedicer.co.uk> 
>> > > wrote:
>> > >>
>> > >> On 22/04/2021 08:07, Dominic Raferd wrote:
>> > >>> On 22/04/2021 08:01, griffin tucker wrote:
>> > >>>> I've tried using deduplication, but only get about 6gb savings per 
>> > >>>> 30gb.
>> > >>>>
>> > >>>> I intend on using squashfs on top of rdiff-backup, btrfs is just being
>> > >>>> used temporarily.
>> > >>>>
>> > >>>> On Thu, 22 Apr 2021 at 16:41, Dominic Raferd
>> > >>>> <domi...@timedicer.co.uk> wrote:
>> > >>>>> On 22/04/2021 07:03, griffin tucker wrote:
>> > >>>>>> i have a collection of the last 5 monthly dumps of various wikis 
>> > >>>>>> from
>> > >>>>>> dumps.wikimedia.org
>> > >>>>>>
>> > >>>>>> each dump has numbered directories in the format 20210501, 20210401,
>> > >>>>>> 20210301, etc.
>> > >>>>>>
>> > >>>>>> all the filenames in these directories remain the same with each
>> > >>>>>> wiki's dump, with the exception of enwiki
>> > >>>>>>
>> > >>>>>> other than enwiki, these range from about 30gb to about 370gb
>> > >>>>>> uncompressed with each successive dump
>> > >>>>>>
>> > >>>>>> enwiki, the main english wikipedia, has mostly the same named files,
>> > >>>>>> but has the pages-meta-history.xml file split up into various 1-55gb
>> > >>>>>> compressed files (mostly 1-2gb) making a total of about 700gb
>> > >>>>>> compressed (disregarding redundant files)
>> > >>>>>>
>> > >>>>>> i'm not sure how big enwiki is uncompressed, but could be close to
>> > >>>>>> 25tb. i haven't figured out how i could make rdiff-backup more
>> > >>>>>> efficient with these files, aside from a script to merge each
>> > >>>>>> metahistory file into a single huge >100gb file and then running
>> > >>>>>> rdiff-backup, and then splitting the file back into their separate
>> > >>>>>> files with an index after restoring
>> > >>>>>>
>> > >>>>>> i'm using btrfs zstd:15 to store the files uncompressed, however i
>> > >>>>>> don't have enough storage to store enwiki uncompressed, zstd
>> > >>>>>> compression just isn't that good, even at maximum - i've used xz
>> > >>>>>> compression which attains much better rates of compression for other
>> > >>>>>> wikis but that isn't exactly seamless (experiments with fuse failed)
>> > >>>>>>
>> > >>>>>> so, to save space, i thought i would use rdiff-backup so that it 
>> > >>>>>> would
>> > >>>>>> only store the differences between dumps, and it works very well in
>> > >>>>>> initial tests, however, if i run the reverse incremental backups one
>> > >>>>>> after the other today, they would be dated today, rather than
>> > >>>>>> 20210501, 20210401, etc. which isn't informative
>> > >>>>>>
>> > >>>>>> if i could add a comment next to each datetime stamp, this would be
>> > >>>>>> useful, otherwise i'll have to keep a separate index, which isn't a
>> > >>>>>> huge problem, i just thought i'd ask if i could change the datetime
>> > >>>>>> stamps before i write such a script
>> > >>>>>>
>> > >>>>>> On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <e...@lavar.de> wrote:
>> > >>>>>>> Hi Griffin,
>> > >>>>>>>
>> > >>>>>>> On 22/04/2021 06:39, griffin tucker wrote:
>> > >>>>>>>> is there a way to change the timestamps of the backups?
>> > >>>>>>> no
>> > >>>>>>>
>> > >>>>>>>> or perhaps replace the timestamps with a unique name?
>> > >>>>>>> no
>> > >>>>>>>
>> > >>>>>>>> would this cause a faulty restore or a damaged backup?
>> > >>>>>>> yes, rdiff-backup makes a lot of date/time comparaisons so the
>> > >>>>>>> timestamp
>> > >>>>>>> is meaningful.
>> > >>>>>>>
>> > >>>>>>> What are you trying to do?
>> > >>>>>>>
>> > >>>>>>> KR, Eric
>> > >>>>> Since you are already using btrfs, have you considered using
>> > >>>>> deduplication? Likely to work better if you store uncompressed.
>> > >>>>>
>> > >>> In your scenario I would expect deduplication to give big savings if
>> > >>> you store uncompressed. If not, YMMV. (I tried with rdiff-backup on
>> > >>> btrfs + deduplication a few years ago but found it all a bit scary and
>> > >>> retreated to ext4.)
>> > >> To clarify, I mean turning off compression within rdiff-backup, and
>> > >> instead using compression (+deduplication) at fs level.
>> > > well, i suppose i was using windows server's dedupe in that 6gb per
>> > > 30gb savings, maybe i should try again with btrfs' dedupe
>> > >
>> > > come to think of it, dedupe seems to be already enabled which would
>> > > explain <5 second copies for hundreds of gigabytes, but i can't get
>> > > the dedupe status when i run:
>> > >
>> > > btrfs dedupe status <mountpoint>
>> > >
>> > > with an error
>> > >
>> > > btrfs: unknown token 'dedupe'
>> > >
>> > > i'll investiage this further
>> > Another option is to use ZFS, Patrik wrote about it here:
>> > https://www.ikus-soft.com/en/blog/2020-07-22-configure-zfs-for-rdiff-backup/
>> i'm reluctant to use zfs because linus torvalds said not to
>>
>
>
> --
> IKUS Software inc.
> https://www.ikus-soft.com/
> 514-971-6442
> 130 rue Doris
> St-Colomban, QC J5K 1T9

Reply via email to