Re: [PERFORM] Backup taking long time !!!

Stephen Frost Fri, 20 Jan 2017 07:08:37 -0800

Vladimir,

* Vladimir Borodin ([email protected]) wrote:
> > 20 янв. 2017 г., в 16:40, Stephen Frost <[email protected]> написал(а):
> >> Increments in pgbackrest are done on file level which is not really 
> >> efficient. We have done parallelism, compression and page-level increments 
> >> (9.3+) in barman fork [1], but unfortunately guys from 2ndquadrant-it 
> >> don’t hurry to work on it.
> > 
> > We're looking at page-level incremental backup in pgbackrest also.  For
> > larger systems, we've not heard too much complaining about it being
> > file-based though, which is why it hasn't been a priority.  Of course,
> > the OP is on 9.1 too, so.
> 
> Well, we have forked barman and made everything from the above just because 
> we needed ~ 2 PB of disk space for storing backups for our ~ 300 TB of data. 
> (Our recovery window is 7 days) And on 5 TB database it took a lot of time to 
> make/restore a backup.


Right, without incremental or compressed backups, you'd have to have
room for 7 full copies of your database.  Have you looked at what your
incrementals would be like with file-level incrementals and compression?

Single-process backup/restore is definitely going to be slow.  We've
seen pgbackrest doing as much as 3TB/hr with 32 cores handling
compression.  Of course, your i/o, network, et al, need to be able to
handle it.

> > As for your fork, well, I can't say I really blame the barman folks for
> > being cautious- that's usually a good thing in your backup software. :)
> 
> The reason seems to be not the caution but the lack of time for working on 
> it. But yep, it took us half a year to deploy our fork everywhere. And it 
> would take much more time if we didn’t have system for checking backups 
> consistency.

How are you testing your backups..?  Do you have page-level checksums
enabled on your database?  pgbackrest recently added the ability to
check PG page-level checksums during a backup and report issues.  We've
also been looking at how to use pgbackrest to do backup/restore+replay
page-level difference analysis but there's still a number of things
which can cause differences, so it's a bit difficult to do.

Of course, doing a pgbackrest-restore-replay+pg_dump+pg_restore is
pretty easy to do and we do use that in some places to validate
backups.

> > I'm curious how you're handling compressed page-level incremental
> > backups though.  I looked through barman-incr and it wasn't obvious to
> > me what was going wrt how the incrementals are stored, are they ending
> > up as sparse files, or are you actually copying/overwriting the prior
> > file in the backup repository?
> 
> No, we do store each file in the following way. At the beginning you write a 
> map of changed pages. At second you write changed pages themselves. The 
> compression is streaming so you don’t need much memory for that but the 
> downside of this approach is that you read each datafile twice (we believe in 
> page cache here).

Ah, yes, I noticed that you passed over the file twice but wasn't quite
sure what functools.partial() was doing and a quick read of the docs
made me think you were doing seeking there.

All the pages are the same size, so I'm surprised you didn't consider
just having a format along the lines of: magic+offset+page,
magic+offset+page, magic+offset+page, etc...

I'd have to defer to David on this, but I think he was considering
having some kind of a bitmap to indicate which pages changed instead
of storing the full offset as, again, all the pages are the same size.

> >  Apologies, python isn't my first
> > language, but the lack of any comment anywhere in that file doesn't
> > really help.
> 
> Not a problem. Actually, it would be much easier to understand if it was a 
> series of commits rather than one commit that we do ammend and force-push 
> after each rebase on vanilla barman. We should add comments.

Both would make it easier to understand, though the comments would be
more helpful for me as I don't actually know the barman code all that
well.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [PERFORM] Backup taking long time !!!

Reply via email to