Re: [HACKERS] Proposal: Incremental Backup

2014-08-13 Thread Claudio Freire
On Tue, Aug 12, 2014 at 8:26 PM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 I'm not talking about malicious attacks, with big enough data sets,
 checksum collisions are much more likely to happen than with smaller
 ones, and incremental backups are supposed to work for the big sets.

 This is an issue when you're talking about de-duplication, not when
 you're talking about testing if two files are the same or not for
 incremental backup purposes.  The size of the overall data set in this
 case is not relevant as you're only ever looking at the same (at most
 1G) specific file in the PostgreSQL data directory.  Were you able to
 actually produce a file with a colliding checksum as an existing PG
 file, the chance that you'd be able to construct one which *also* has
 a valid page layout sufficient that it wouldn't be obviously massivly
 corrupted is very quickly approaching zero.

True, but only with a strong hash, not an adler32 or something like that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Marco Nenciarini
As I already stated, timestamps will be only used to early detect
changed files. To declared two files identical they must have same size,
same mtime and same *checksum*.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Claudio Freire
On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 To declared two files identical they must have same size,
 same mtime and same *checksum*.

Still not safe. Checksum collisions do happen, especially in big data sets.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Gabriele Bartolini
Hi Claudio,

2014-08-12 15:25 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 Still not safe. Checksum collisions do happen, especially in big data sets.

Can I ask you what you are currently using for backing up large data
sets with Postgres?

Thanks,
Gabriele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Marco Nenciarini
Il 12/08/14 15:25, Claudio Freire ha scritto:
 On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 To declared two files identical they must have same size,
 same mtime and same *checksum*.
 
 Still not safe. Checksum collisions do happen, especially in big data sets.
 

IMHO it is still good-enough. We are not trying to protect from a
malicious attack, we are using it to protect against some *casual* event.

Even cosmic rays have a not null probability of corrupting your database
in a not-noticeable way. And you can probably notice it better with a
checksum than with a LSN :-)

Given that, I think that whatever solution we choose, we should include
 checksums in it.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Andres Freund
On 2014-08-12 10:25:21 -0300, Claudio Freire wrote:
 On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
  To declared two files identical they must have same size,
  same mtime and same *checksum*.
 
 Still not safe. Checksum collisions do happen, especially in big data sets.

If you use an appropriate algorithm for appropriate amounts of data
that's not a relevant concern. You can easily do different checksums for
every 1GB segment of data. If you do it right the likelihood of
conflicts doing that is so low it doesn't matter at all.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Claudio Freire
On Tue, Aug 12, 2014 at 11:17 AM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:

 2014-08-12 15:25 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 Still not safe. Checksum collisions do happen, especially in big data sets.

 Can I ask you what you are currently using for backing up large data
 sets with Postgres?

Currently, a time-delayed WAL archive hot standby, pg_dump sparingly,
filesystem snapshots (incremental) of the standby more often, with the
standby down.

When I didn't have the standby, I did online filesystem snapshots of
the master with WAL archiving to prevent inconsistency due to
snapshots not being atomic.

On Tue, Aug 12, 2014 at 11:25 AM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 Il 12/08/14 15:25, Claudio Freire ha scritto:
 On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 To declared two files identical they must have same size,
 same mtime and same *checksum*.

 Still not safe. Checksum collisions do happen, especially in big data sets.


 IMHO it is still good-enough. We are not trying to protect from a
 malicious attack, we are using it to protect against some *casual* event.

I'm not talking about malicious attacks, with big enough data sets,
checksum collisions are much more likely to happen than with smaller
ones, and incremental backups are supposed to work for the big sets.

You could use strong cryptographic checksums, but such strong
checksums still aren't perfect, and even if you accept the slim chance
of collision, they are quite expensive to compute, so it's bound to be
a bottleneck with good I/O subsystems. Checking the LSN is much
cheaper.

Still, do as you will. As everybody keeps saying it's better than
nothing, lets let usage have the final word.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Robert Haas
On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund and...@2ndquadrant.com wrote:
 Still not safe. Checksum collisions do happen, especially in big data sets.

 If you use an appropriate algorithm for appropriate amounts of data
 that's not a relevant concern. You can easily do different checksums for
 every 1GB segment of data. If you do it right the likelihood of
 conflicts doing that is so low it doesn't matter at all.

True, but if you use LSNs the likelihood is 0.  Comparing the LSN is
also most likely a heck of a lot faster than checksumming the entire
page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Fujii Masao
On Wed, Aug 13, 2014 at 12:58 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Still not safe. Checksum collisions do happen, especially in big data sets.

 If you use an appropriate algorithm for appropriate amounts of data
 that's not a relevant concern. You can easily do different checksums for
 every 1GB segment of data. If you do it right the likelihood of
 conflicts doing that is so low it doesn't matter at all.

 True, but if you use LSNs the likelihood is 0.  Comparing the LSN is
 also most likely a heck of a lot faster than checksumming the entire
 page.

If we use LSN, the strong safeguard seems to be required to prevent a user
from taking the incremental backup against wrong instance. For example,
it's the case where the first full backup is taken, PITR to a certain
past location,
then the incremental backup is taken between that first full backup and
the current database after PITR. PITR rewinds LSN, so such incremental
backup might be corrupted. If so, the safeguard for those problematic cases
should be needed. Otherwise, I'm afraid that a user can easily mistake the
incremental backup.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-12 Thread Stephen Frost
Claudio,

* Claudio Freire (klaussfre...@gmail.com) wrote:
 I'm not talking about malicious attacks, with big enough data sets,
 checksum collisions are much more likely to happen than with smaller
 ones, and incremental backups are supposed to work for the big sets.

This is an issue when you're talking about de-duplication, not when
you're talking about testing if two files are the same or not for
incremental backup purposes.  The size of the overall data set in this
case is not relevant as you're only ever looking at the same (at most
1G) specific file in the PostgreSQL data directory.  Were you able to
actually produce a file with a colliding checksum as an existing PG
file, the chance that you'd be able to construct one which *also* has
a valid page layout sufficient that it wouldn't be obviously massivly
corrupted is very quickly approaching zero.

 You could use strong cryptographic checksums, but such strong
 checksums still aren't perfect, and even if you accept the slim chance
 of collision, they are quite expensive to compute, so it's bound to be
 a bottleneck with good I/O subsystems. Checking the LSN is much
 cheaper.

For my 2c on this- I'm actually behind the idea of using the LSN (though
I have not followed this thread in any detail), but there's plenty of
existing incremental backup solutions (PG specific and not) which work
just fine by doing checksums.  If you truely feel that this is a real
concern, I'd suggest you review the rsync binary diff protocol which is
used extensively around the world and show reports of it failing in the
field.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-08-11 Thread Robert Haas
On Tue, Aug 5, 2014 at 8:04 PM, Simon Riggs si...@2ndquadrant.com wrote:
 To decide whether we need to re-copy the file, you read the file until
 we find a block with a later LSN. If we read the whole file without
 finding a later LSN then we don't need to re-copy. That means we read
 each file twice, which is slower, but the file is at most 1GB in size,
 we we can assume will be mostly in memory for the second read.

That seems reasonable, although copying only the changed blocks
doesn't seem like it would be a whole lot harder.  Yes, you'd need a
tool to copy those blocks back into the places where they need to go,
but that's probably not a lot of work and the disk savings, in many
cases, would be enormous.

 As Marco says, that can be optimized using filesystem timestamps instead.

The idea of using filesystem timestamps gives me the creeps.  Those
aren't always very granular, and I don't know that (for example) they
are crash-safe.  Does every filesystem on every platform make sure
that the mtime update hits the disk before the data?  What about clock
changes made manually by users, or automatically by ntpd? I recognize
that there are people doing this today, because it's what we have, and
it must not suck too much, because people are still doing it ... but I
worry that if we do it this way, we'll end up with people saying
PostgreSQL corrupted my data and will have no way of tracking the
problem back to the filesystem or system clock event that was the true
cause of the problem, so they'll just blame the database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-11 Thread Claudio Freire
On Mon, Aug 11, 2014 at 12:27 PM, Robert Haas robertmh...@gmail.com wrote:

 As Marco says, that can be optimized using filesystem timestamps instead.

 The idea of using filesystem timestamps gives me the creeps.  Those
 aren't always very granular, and I don't know that (for example) they
 are crash-safe.  Does every filesystem on every platform make sure
 that the mtime update hits the disk before the data?  What about clock
 changes made manually by users, or automatically by ntpd? I recognize
 that there are people doing this today, because it's what we have, and
 it must not suck too much, because people are still doing it ... but I
 worry that if we do it this way, we'll end up with people saying
 PostgreSQL corrupted my data and will have no way of tracking the
 problem back to the filesystem or system clock event that was the true
 cause of the problem, so they'll just blame the database.

I have the same creeps. I only do it on a live system, after a first
full rsync, where mtime persistence is not an issue, and where I know
ntp updates have not happened.

I had a problem once where a differential rsync with timestamps didn't
work as expected, and corrupted a slave. It was a test system so I
didn't care much at the time, but if it were a backup, I'd be quite
pissed.

Basically, mtimes aren't trustworthy across reboots. Granted, this was
a very old system, debian 5 when it was new, IIRC, so it may be better
now. But it does illustrate just how bad things can get when one
trusts timestamps. This case was an old out-of-sync slave on a test
set up that got de-synchronized, and I tried to re-synchronize it with
a delta rsync to avoid the hours it would take to actually compare
everything (about a day). One segment that was modified after the sync
loss was not transfered, causing trouble at the slave, so I was forced
to re-synchronize with a full rsync (delta, but without timestamps).
This was either before pg_basebackup or before I heard of it ;-), but
in any case, if it happened on a test system with little activity, you
can be certain it can happen on a production system.

So I now only trust mtime when there has been neither a reboot nor an
ntpd running since the last mtime-less rsync. On those cases, the
optimization works and helps a lot. But I doubt you'll take many
incremental backups matching those conditions.

Say what you will of anecdotal evidence, but the issue is quite clear
theoretically as well: modifications to file segments that aren't
reflected within mtime granularity. There are many reasons why mtime
could lose precision. Being an old filesystem with second-precision
timestamps is just one, but not the only one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-08 Thread Benedikt Grundmann
On Thu, Aug 7, 2014 at 6:29 PM, Gabriele Bartolini 
gabriele.bartol...@2ndquadrant.it wrote:

 Hi Marco,

  With the current full backup procedure they are backed up, so I think
  that having them backed up with a rsync-like algorithm is what an user
  would expect for an incremental backup.

 Exactly. I think a simple, flexible and robust method for file based
 incremental backup is all we need. I am confident it could be done for
 9.5.

 I would like to quote every single word Simon said. Block level
 incremental backup (with Robert's proposal) is definitely the ultimate
 goal for effective and efficient physical backups. I see file level
 incremental backup as a very good compromise, a sort of intermediate
 release which could nonetheless produce a lot of benefits to our user
 base, for years to come too.

 Thanks,
 Gabriele


I haven't been following this discussion closely at all. But at Janestreet
we have been using pg_start_backup together with rsync --link-dest (onto a
big NFS) to achieve incremental stored backup.  In our experience this
works very well, it is however advisable to look into whatever is used to
serve the NFS as we had to set some options to increase the maximum number
of hardlinks.

Cheers,

Bene



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Simon Riggs
On 6 August 2014 17:27, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote:
 
  Well, for file-level backups we have:
 
  1) use file modtime (possibly inaccurate)
  2) use file modtime and checksums (heavy read load)
 
  For block-level backups we have:
 
  3) accumulate block numbers as WAL is written
  4) read previous WAL at incremental backup time
  5) read data page LSNs (high read load)
 
  The question is which of these do we want to implement?  #1 is very easy
  to implement, but incremental _file_ backups are larger than block-level
  backups.  If we have #5, would we ever want #2?  If we have #3, would we
  ever want #4 or #5?

 You may want to implement both #3 and #2. #3 would need a config
 switch to enable updating the bitmap. That would make it optional to
 incur the I/O cost of updating the bitmap. When the bitmap isn't
 there, the backup would use #2. Slow, but effective. If slowness is a
 problem for you, you enable the bitmap and do #3.

 Sounds reasonable IMO, and it means you can start by implementing #2.

 Well, Robert Haas had the idea of a separate process that accumulates
 the changed WAL block numbers, making it low overhead.  I question
 whether we need #2 just to handle cases where they didn't enable #3
 accounting earlier.  If that is the case, just do a full backup and
 enable #3.

Well, there is a huge difference between file-level and block-level backup.

Designing, writing and verifying block-level backup to the point that
it is acceptable is a huge effort. (Plus, I don't think accumulating
block numbers as they are written will be low overhead. Perhaps
there was a misunderstanding there and what is being suggested is to
accumulate file names that change as they are written, since we
already do that in the checkpointer process, which would be an option
between 2 and 3 on the above list).

What is being proposed here is file-level incremental backup that
works in a general way for various backup management tools. It's the
80/20 first step on the road. We get most of the benefit, it can be
delivered in this release as robust, verifiable code. Plus, that is
all we have budget for, a fairly critical consideration.

Big features need to be designed incrementally across multiple
releases, delivering incremental benefit (or at least that is what I
have learned). Yes, working block-level backup would be wonderful, but
if we hold out for that as the first step then we'll get nothing
anytime soon.

I would also point out that the more specific we make our backup
solution the less likely it is to integrate with external backup
providers. Oracle's RMAN requires specific support in external
software. 10 years after Postgres PITR we still see many vendors
showing PostgreSQL Backup Supported as meaning pg_dump only.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Fujii Masao
On Thu, Aug 7, 2014 at 12:20 AM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
 On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote:
  On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
  On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
  
   On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
   Thinking some more, there seems like this whole store-multiple-LSNs
   thing is too much. We can still do block-level incrementals just by
   using a single LSN as the reference point. We'd still need a complex
   file format and a complex file reconstruction program, so I think that
   is still next release. We can call that INCREMENTAL BLOCK LEVEL.
 
  Yes, that's the approach taken by pg_rman for its block-level
  incremental backup. Btw, I don't think that the CPU cost to scan all
  the relation files added to the one to rebuild the backups is worth
  doing it on large instances. File-level backup would cover most of the
 
  Well, if you scan the WAL files from the previous backup, that will tell
  you what pages that need incremental backup.

 That would require you to store that WAL, which is something we hope
 to avoid. Plus if you did store it, you'd need to retrieve it from
 long term storage, which is what we hope to avoid.

 Well, for file-level backups we have:

 1) use file modtime (possibly inaccurate)
 2) use file modtime and checksums (heavy read load)

 For block-level backups we have:

 3) accumulate block numbers as WAL is written
 4) read previous WAL at incremental backup time
 5) read data page LSNs (high read load)

 The question is which of these do we want to implement?

There are some data which don't have LSN, for example, postgresql.conf.
When such data has been modified since last backup, they also need to
be included in incremental backup? Probably yes. So implementing only
block-level backup seems not complete solution. It needs file-level backup as
an infrastructure for such data. This makes me think that it's more reasonable
to implement file-level backup first.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Michael Paquier
On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
 There are some data which don't have LSN, for example, postgresql.conf.
 When such data has been modified since last backup, they also need to
 be included in incremental backup? Probably yes.
Definitely yes. That's as well the case of paths like pg_clog,
pg_subtrans and pg_twophase.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Bruce Momjian
On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
 On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
  There are some data which don't have LSN, for example, postgresql.conf.
  When such data has been modified since last backup, they also need to
  be included in incremental backup? Probably yes.
 Definitely yes. That's as well the case of paths like pg_clog,
 pg_subtrans and pg_twophase.

I assumed these would be unconditionally backed up during an incremental
backup because they relatively small and you don't want to make a mistake.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Bruce Momjian
On Thu, Aug  7, 2014 at 11:03:40AM +0100, Simon Riggs wrote:
 Well, there is a huge difference between file-level and block-level backup.
 
 Designing, writing and verifying block-level backup to the point that
 it is acceptable is a huge effort. (Plus, I don't think accumulating
 block numbers as they are written will be low overhead. Perhaps
 there was a misunderstanding there and what is being suggested is to
 accumulate file names that change as they are written, since we
 already do that in the checkpointer process, which would be an option
 between 2 and 3 on the above list).
 
 What is being proposed here is file-level incremental backup that
 works in a general way for various backup management tools. It's the
 80/20 first step on the road. We get most of the benefit, it can be
 delivered in this release as robust, verifiable code. Plus, that is
 all we have budget for, a fairly critical consideration.
 
 Big features need to be designed incrementally across multiple
 releases, delivering incremental benefit (or at least that is what I
 have learned). Yes, working block-level backup would be wonderful, but
 if we hold out for that as the first step then we'll get nothing
 anytime soon.

That is fine.  I just wanted to point out that as features are added,
file-level incremental backups might not be useful.  In fact, I think
there are a lot of users for which file-level incremental backups will
never be useful, i.e. you have to have a lot of frozen/static data for
file-level incremental backups to be useful.  

I am a little worried that many users will not realize this until they
try it and are disappointed, e.g. Why is PG writing to my static data
so often? --- then we get beaten up about our hint bits and freezing
behavior.  :-(

I am just trying to set realistic expectations.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Marco Nenciarini
Il 07/08/14 17:29, Bruce Momjian ha scritto:
 I am a little worried that many users will not realize this until they
 try it and are disappointed, e.g. Why is PG writing to my static data
 so often? --- then we get beaten up about our hint bits and freezing
 behavior.  :-(
 
 I am just trying to set realistic expectations.
 

Our experience is that for big databases (size over about 50GB) the
file-level approach is often enough to halve the size of the backup.

Users which run Postgres as Data Warehouse surely will benefit from it,
so we could present it as a DWH oriented feature.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Marco Nenciarini
Il 07/08/14 17:25, Bruce Momjian ha scritto:
 On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
 On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
 There are some data which don't have LSN, for example, postgresql.conf.
 When such data has been modified since last backup, they also need to
 be included in incremental backup? Probably yes.
 Definitely yes. That's as well the case of paths like pg_clog,
 pg_subtrans and pg_twophase.
 
 I assumed these would be unconditionally backed up during an incremental
 backup because they relatively small and you don't want to make a mistake.
 

You could decide to always copy files which doesn't have LSN, but you
don't know what the user could put inside PGDATA. I would avoid any
assumption on files which are not owned by Postgres.

With the current full backup procedure they are backed up, so I think
that having them backed up with a rsync-like algorithm is what an user
would expect for an incremental backup.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Gabriele Bartolini
Hi Marco,

 With the current full backup procedure they are backed up, so I think
 that having them backed up with a rsync-like algorithm is what an user
 would expect for an incremental backup.

Exactly. I think a simple, flexible and robust method for file based
incremental backup is all we need. I am confident it could be done for
9.5.

I would like to quote every single word Simon said. Block level
incremental backup (with Robert's proposal) is definitely the ultimate
goal for effective and efficient physical backups. I see file level
incremental backup as a very good compromise, a sort of intermediate
release which could nonetheless produce a lot of benefits to our user
base, for years to come too.

Thanks,
Gabriele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
 On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote:
  On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
  On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
  
   On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
   Thinking some more, there seems like this whole store-multiple-LSNs
   thing is too much. We can still do block-level incrementals just by
   using a single LSN as the reference point. We'd still need a complex
   file format and a complex file reconstruction program, so I think that
   is still next release. We can call that INCREMENTAL BLOCK LEVEL.
 
  Yes, that's the approach taken by pg_rman for its block-level
  incremental backup. Btw, I don't think that the CPU cost to scan all
  the relation files added to the one to rebuild the backups is worth
  doing it on large instances. File-level backup would cover most of the
 
  Well, if you scan the WAL files from the previous backup, that will tell
  you what pages that need incremental backup.
 
 That would require you to store that WAL, which is something we hope
 to avoid. Plus if you did store it, you'd need to retrieve it from
 long term storage, which is what we hope to avoid.

Well, for file-level backups we have:

1) use file modtime (possibly inaccurate)
2) use file modtime and checksums (heavy read load)

For block-level backups we have:

3) accumulate block numbers as WAL is written
4) read previous WAL at incremental backup time
5) read data page LSNs (high read load)

The question is which of these do we want to implement?  #1 is very easy
to implement, but incremental _file_ backups are larger than block-level
backups.  If we have #5, would we ever want #2?  If we have #3, would we
ever want #4 or #5?

  I am thinking we need a wiki page to outline all these options.
 
 There is a Wiki page.

I would like to see that wiki page have a more open approach to
implementations.

I do think this is a very important topic for us.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-06 Thread Claudio Freire
On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote:

 Well, for file-level backups we have:

 1) use file modtime (possibly inaccurate)
 2) use file modtime and checksums (heavy read load)

 For block-level backups we have:

 3) accumulate block numbers as WAL is written
 4) read previous WAL at incremental backup time
 5) read data page LSNs (high read load)

 The question is which of these do we want to implement?  #1 is very easy
 to implement, but incremental _file_ backups are larger than block-level
 backups.  If we have #5, would we ever want #2?  If we have #3, would we
 ever want #4 or #5?

You may want to implement both #3 and #2. #3 would need a config
switch to enable updating the bitmap. That would make it optional to
incur the I/O cost of updating the bitmap. When the bitmap isn't
there, the backup would use #2. Slow, but effective. If slowness is a
problem for you, you enable the bitmap and do #3.

Sounds reasonable IMO, and it means you can start by implementing #2.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
 On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian br...@momjian.us wrote:
 
  Well, for file-level backups we have:
 
  1) use file modtime (possibly inaccurate)
  2) use file modtime and checksums (heavy read load)
 
  For block-level backups we have:
 
  3) accumulate block numbers as WAL is written
  4) read previous WAL at incremental backup time
  5) read data page LSNs (high read load)
 
  The question is which of these do we want to implement?  #1 is very easy
  to implement, but incremental _file_ backups are larger than block-level
  backups.  If we have #5, would we ever want #2?  If we have #3, would we
  ever want #4 or #5?
 
 You may want to implement both #3 and #2. #3 would need a config
 switch to enable updating the bitmap. That would make it optional to
 incur the I/O cost of updating the bitmap. When the bitmap isn't
 there, the backup would use #2. Slow, but effective. If slowness is a
 problem for you, you enable the bitmap and do #3.
 
 Sounds reasonable IMO, and it means you can start by implementing #2.

Well, Robert Haas had the idea of a separate process that accumulates
the changed WAL block numbers, making it low overhead.  I question
whether we need #2 just to handle cases where they didn't enable #3
accounting earlier.  If that is the case, just do a full backup and
enable #3.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Gabriele Bartolini
Hi Claudio,

  I think there has been a misunderstanding. I agree with you (and I
think also Marco) that LSN is definitely a component to consider in
this process. We will come up with an alternate proposal which
considers LSNS either today or tomorrow. ;)

Thanks,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


2014-08-04 20:30 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

 Thing is, I don't see how the LSN method is that much harder than an
 on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

 Keeping a last-updated-LSN for each segment (or group of blocks) is
 just as easy as keeping a bitmap, and far more flexible and robust.

 The complexity and cost of safely keeping the map up-to-date is what's
 in question here, but as was pointed before, there's no really safe
 alternative. Nor modification times nor checksums (nor in-memory
 bitmaps IMV) are really safe enough for backups, so you really want to
 use something like the LSN. It's extra work, but opens up a world of
 possibilities.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 4 August 2014 19:30, Claudio Freire klaussfre...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

 Thing is, I don't see how the LSN method is that much harder than an
 on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

 Keeping a last-updated-LSN for each segment (or group of blocks) is
 just as easy as keeping a bitmap, and far more flexible and robust.

 The complexity and cost of safely keeping the map up-to-date is what's
 in question here, but as was pointed before, there's no really safe
 alternative. Nor modification times nor checksums (nor in-memory
 bitmaps IMV) are really safe enough for backups, so you really want to
 use something like the LSN. It's extra work, but opens up a world of
 possibilities.

OK, some comments on all of this.

* Wikipedia thinks the style of backup envisaged should be called Incremental
https://en.wikipedia.org/wiki/Differential_backup

* Base backups are worthless without WAL right up to the *last* LSN
seen during the backup, which is why pg_stop_backup() returns an LSN.
This is the LSN that is the effective viewpoint of the whole base
backup. So if we wish to get all changes since the last backup, we
must re-quote this LSN. (Or put another way - file level LSNs don't
make sense - we just need one LSN for the whole backup).

* When we take an incremental backup we need the WAL from the backup
start LSN through to the backup stop LSN. We do not need the WAL
between the last backup stop LSN and the new incremental start LSN.
That is a huge amount of WAL in many cases and we'd like to avoid
that, I would imagine. (So the space savings aren't just the delta
from the main data files, we should also look at WAL savings).

* For me, file based incremental is a useful and robust feature.
Block-level incremental is possible, but requires either significant
persistent metadata (1 MB per GB file) or access to the original
backup. One important objective here is to make sure we do NOT have to
re-read the last backup when taking the next backup; this helps us to
optimize the storage costs for backups. Plus, block-level recovery
requires us to have a program that correctly re-writes data into the
correct locations in a file, which seems likely to be a slow and bug
ridden process to me. Nice, safe, solid file-level incremental backup
first please. Fancy, bug prone, block-level stuff much later.

* One purpose of this could be to verify the backup. rsync provides a
checksum, pg_basebackup does not. However, checksums are frequently
prohibitively expensive, so perhaps asking for that is impractical and
maybe only a secondary objective.

* If we don't want/have file checksums, then we don't need a profile
file and using just the LSN seems fine. I don't think we should
specify that manually - the correct LSN is written to the backup_label
file in a base backup and we should read it back from there. We should
also write a backup_label file to incremental base backups, then we
can have additional lines saying what the source backups were. So full
base backup backup_labels remain as they are now, but we add one
additional line per increment, so we have the full set of increments,
much like a history file.

Normal backup_label files look like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo

so we would have a file that looks like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo
INCREMENTAL 1
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo incremental 1
INCREMENTAL 2
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo incremental 2
... etc ...

which we interpret as showing the original base backup, then the first
increment, then the second increment etc.. which allows us to recover
the backups in the correct sequence.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 3:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 4 August 2014 19:30, Claudio Freire klaussfre...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

 Thing is, I don't see how the LSN method is that much harder than an
 on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

 Keeping a last-updated-LSN for each segment (or group of blocks) is
 just as easy as keeping a bitmap, and far more flexible and robust.

 The complexity and cost of safely keeping the map up-to-date is what's
 in question here, but as was pointed before, there's no really safe
 alternative. Nor modification times nor checksums (nor in-memory
 bitmaps IMV) are really safe enough for backups, so you really want to
 use something like the LSN. It's extra work, but opens up a world of
 possibilities.

 OK, some comments on all of this.

 * Wikipedia thinks the style of backup envisaged should be called 
 Incremental
 https://en.wikipedia.org/wiki/Differential_backup

 * Base backups are worthless without WAL right up to the *last* LSN
 seen during the backup, which is why pg_stop_backup() returns an LSN.
 This is the LSN that is the effective viewpoint of the whole base
 backup. So if we wish to get all changes since the last backup, we
 must re-quote this LSN. (Or put another way - file level LSNs don't
 make sense - we just need one LSN for the whole backup).

File-level LSNs are an optimization. When you want to backup all files
modified since the last base or incremental backup (yes, you need the
previous backup label at least), you check the file-level LSN range.
That tells you which changesets touched that file, so you know
whether to process it or not.

Block-level LSNs (or, rather, block-segment-level) are just a
refinement of that.

 * When we take an incremental backup we need the WAL from the backup
 start LSN through to the backup stop LSN. We do not need the WAL
 between the last backup stop LSN and the new incremental start LSN.
 That is a huge amount of WAL in many cases and we'd like to avoid
 that, I would imagine. (So the space savings aren't just the delta
 from the main data files, we should also look at WAL savings).

Yes, probably something along the lines of removing redundant FPW and
stuff like that.

 * For me, file based incremental is a useful and robust feature.
 Block-level incremental is possible, but requires either significant
 persistent metadata (1 MB per GB file) or access to the original
 backup. One important objective here is to make sure we do NOT have to
 re-read the last backup when taking the next backup; this helps us to
 optimize the storage costs for backups. Plus, block-level recovery
 requires us to have a program that correctly re-writes data into the
 correct locations in a file, which seems likely to be a slow and bug
 ridden process to me. Nice, safe, solid file-level incremental backup
 first please. Fancy, bug prone, block-level stuff much later.

Ok. You could do incremental first without any kind of optimization,
then file-level optimization by keeping a file-level LSN range, and
then extend that to block-segment-level LSN ranges. That sounds like a
plan to me.

But, I don't see how you'd do the one without optimization without
reading the previous backup for comparing deltas. Remember checksums
are deemed not trustworthy, not just by me, so that (which was the
original proposition) doesn't work.

 * If we don't want/have file checksums, then we don't need a profile
 file and using just the LSN seems fine. I don't think we should
 specify that manually - the correct LSN is written to the backup_label
 file in a base backup and we should read it back from there.

Agreed


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:

 * When we take an incremental backup we need the WAL from the backup
 start LSN through to the backup stop LSN. We do not need the WAL
 between the last backup stop LSN and the new incremental start LSN.
 That is a huge amount of WAL in many cases and we'd like to avoid
 that, I would imagine. (So the space savings aren't just the delta
 from the main data files, we should also look at WAL savings).

 Yes, probably something along the lines of removing redundant FPW and
 stuff like that.

Not what I mean at all, sorry for confusing.

Each backup has a start LSN and a stop LSN. You need all the WAL
between those two points (-X option)

But if you have an incremental backup (b2), it depends upon an earlier
backup (b1).

You don't need the WAL between b1.stop_lsn and b2.start_lsn.

In typical cases, start to stop will be a few hours or less, whereas
we'd be doing backups at most daily. Which would mean we'd only need
to store at most 10% of the WAL files because we don't need WAL
between backups.

 * For me, file based incremental is a useful and robust feature.
 Block-level incremental is possible, but requires either significant
 persistent metadata (1 MB per GB file) or access to the original
 backup. One important objective here is to make sure we do NOT have to
 re-read the last backup when taking the next backup; this helps us to
 optimize the storage costs for backups. Plus, block-level recovery
 requires us to have a program that correctly re-writes data into the
 correct locations in a file, which seems likely to be a slow and bug
 ridden process to me. Nice, safe, solid file-level incremental backup
 first please. Fancy, bug prone, block-level stuff much later.

 Ok. You could do incremental first without any kind of optimization,

Yes, that is what makes sense to me. Fast, simple, robust and most of
the benefit.

We should call this INCREMENTAL FILE LEVEL

 then file-level optimization by keeping a file-level LSN range, and
 then extend that to block-segment-level LSN ranges. That sounds like a
 plan to me.

Thinking some more, there seems like this whole store-multiple-LSNs
thing is too much. We can still do block-level incrementals just by
using a single LSN as the reference point. We'd still need a complex
file format and a complex file reconstruction program, so I think that
is still next release. We can call that INCREMENTAL BLOCK LEVEL

 But, I don't see how you'd do the one without optimization without
 reading the previous backup for comparing deltas. Remember checksums
 are deemed not trustworthy, not just by me, so that (which was the
 original proposition) doesn't work.

Every incremental backup refers to an earlier backup as a reference
point, which may then refer to an earlier one, in a chain.

Each backup has a single LSN associated with it, as stored in the
backup_label. (So we don't need the profile stage now, AFAICS)

To decide whether we need to re-copy the file, you read the file until
we find a block with a later LSN. If we read the whole file without
finding a later LSN then we don't need to re-copy. That means we read
each file twice, which is slower, but the file is at most 1GB in size,
we we can assume will be mostly in memory for the second read.

As Marco says, that can be optimized using filesystem timestamps instead.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Michael Paquier
On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
 Thinking some more, there seems like this whole store-multiple-LSNs
 thing is too much. We can still do block-level incrementals just by
 using a single LSN as the reference point. We'd still need a complex
 file format and a complex file reconstruction program, so I think that
 is still next release. We can call that INCREMENTAL BLOCK LEVEL.

Yes, that's the approach taken by pg_rman for its block-level
incremental backup. Btw, I don't think that the CPU cost to scan all
the relation files added to the one to rebuild the backups is worth
doing it on large instances. File-level backup would cover most of the
use cases that people face, and simplify footprint on core code. With
a single LSN as reference position of course to determine if a file
needs to be backup up of course, if it has at least one block that has
been modified with a LSN newer than the reference point.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 9:17 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
 Thinking some more, there seems like this whole store-multiple-LSNs
 thing is too much. We can still do block-level incrementals just by
 using a single LSN as the reference point. We'd still need a complex
 file format and a complex file reconstruction program, so I think that
 is still next release. We can call that INCREMENTAL BLOCK LEVEL.

 Yes, that's the approach taken by pg_rman for its block-level
 incremental backup. Btw, I don't think that the CPU cost to scan all
 the relation files added to the one to rebuild the backups is worth
 doing it on large instances. File-level backup would cover most of the
 use cases that people face, and simplify footprint on core code. With
 a single LSN as reference position of course to determine if a file
 needs to be backup up of course, if it has at least one block that has
 been modified with a LSN newer than the reference point.


It's the finding of that block that begs optimizing IMO.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 9:04 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:

 * When we take an incremental backup we need the WAL from the backup
 start LSN through to the backup stop LSN. We do not need the WAL
 between the last backup stop LSN and the new incremental start LSN.
 That is a huge amount of WAL in many cases and we'd like to avoid
 that, I would imagine. (So the space savings aren't just the delta
 from the main data files, we should also look at WAL savings).

 Yes, probably something along the lines of removing redundant FPW and
 stuff like that.

 Not what I mean at all, sorry for confusing.

 Each backup has a start LSN and a stop LSN. You need all the WAL
 between those two points (-X option)

 But if you have an incremental backup (b2), it depends upon an earlier
 backup (b1).

 You don't need the WAL between b1.stop_lsn and b2.start_lsn.

 In typical cases, start to stop will be a few hours or less, whereas
 we'd be doing backups at most daily. Which would mean we'd only need
 to store at most 10% of the WAL files because we don't need WAL
 between backups.

I was assuming you wouldn't store that WAL. You might not even have it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
 On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
  Thinking some more, there seems like this whole store-multiple-LSNs
  thing is too much. We can still do block-level incrementals just by
  using a single LSN as the reference point. We'd still need a complex
  file format and a complex file reconstruction program, so I think that
  is still next release. We can call that INCREMENTAL BLOCK LEVEL.
 
 Yes, that's the approach taken by pg_rman for its block-level
 incremental backup. Btw, I don't think that the CPU cost to scan all
 the relation files added to the one to rebuild the backups is worth
 doing it on large instances. File-level backup would cover most of the

Well, if you scan the WAL files from the previous backup, that will tell
you what pages that need incremental backup.

I am thinking we need a wiki page to outline all these options.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
 On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
  Thinking some more, there seems like this whole store-multiple-LSNs
  thing is too much. We can still do block-level incrementals just by
  using a single LSN as the reference point. We'd still need a complex
  file format and a complex file reconstruction program, so I think that
  is still next release. We can call that INCREMENTAL BLOCK LEVEL.

 Yes, that's the approach taken by pg_rman for its block-level
 incremental backup. Btw, I don't think that the CPU cost to scan all
 the relation files added to the one to rebuild the backups is worth
 doing it on large instances. File-level backup would cover most of the

 Well, if you scan the WAL files from the previous backup, that will tell
 you what pages that need incremental backup.

That would require you to store that WAL, which is something we hope
to avoid. Plus if you did store it, you'd need to retrieve it from
long term storage, which is what we hope to avoid.

 I am thinking we need a wiki page to outline all these options.

There is a Wiki page.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-04 Thread Gabriele Bartolini
Hi guys,

  sorry if I jump in the middle of the conversation. I have been
reading with much interest all that's been said above. However, the
goal of this patch is to give users another possibility while
performing backups. Especially when large databases are in use.

   I really like the proposal of working on a block level incremental
backup feature and the idea of considering LSN. However, I'd suggest
to see block level as a second step and a goal to keep in mind while
working on the first step. I believe that file-level incremental
backup will bring a lot of benefits to our community and users anyway.

  I base this sentence on our daily experience. We have to honour (and
the duty) to manage - probably - some of the largest Postgres
databases in the world. We currently rely on rsync to copy database
pages. Performing a full backup in 2 days instead of 9 days completely
changes disaster recovery policies in a company. Or even 2 hours
instead of 6.

My 2 cents,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


2014-08-01 19:05 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 On Fri, Aug 1, 2014 at 1:43 PM, desmodemone desmodem...@gmail.com wrote:



 2014-08-01 18:20 GMT+02:00 Claudio Freire klaussfre...@gmail.com:

 On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  c) the map is not crash safe by design, because it needs only for
  incremental backup to track what blocks needs to be backuped, not for
  consistency or recovery of the whole cluster, so it's not an heavy cost
  for
  the whole cluster to maintain it. we could think an option (but it's
  heavy)
  to write it at every flush  on file to have crash-safe map, but I not
  think
  it's so usefull . I think it's acceptable, and probably it's better to
  force
  that, to say: if your db will crash, you need a fullbackup ,
 
  I am not sure if your this assumption is right/acceptable, how can
  we say that in such a case users will be okay to have a fullbackup?
  In general, taking fullbackup is very heavy operation and we should
  try to avoid such a situation.


 Besides, the one taking the backup (ie: script) may not be aware of
 the need to take a full one.

 It's a bad design to allow broken backups at all, IMNSHO.


 Hi Claudio,
  thanks for your observation
 First: the case it's after a crash of a database, and it's not something
 happens every day or every week. It's something that happens in rare
 conditions, or almost my experience is so. If it happens very often probably
 there are other problems.

 Not so much. In this case, the software design isn't software-crash
 safe, it's not that it's not hardware-crash safe.

 What I mean, is that an in-memory bitmap will also be out of sync if
 you kill -9 (or if one of the backends is killed by the OOM), or if it
 runs out of disk space too.

 Normally, a simple restart fixes it because pg will do crash recovery
 just fine, but now the bitmap is out of sync, and further backups are
 broken. It's not a situation I want to face unless there's a huge
 reason to go for such design.

 If you make it so that the commit includes flipping the bitmap, it can
 be done cleverly enough to avoid too much overhead (though it will
 have some), and you now have it so that any to-be-touched block is now
 part of the backup. You just apply all the bitmap changes in batch
 after a checkpoint, before syncing to disk, and before erasing the WAL
 segments. Simple, relatively efficient, and far more robust than an
 in-memory thing.

 Still, it *can* double checkpoint I/O on the worst case, and it's not
 an unfathomable case either.

 Second: to avoid the problem to know if the db needed to have a full backup
 to rebuild the map we could think to write in the map header the backup
 reference (with an id and LSN reference for example ) so  if the
 someone/something try to do an incremental backup after a crash, the map
 header will not have noone full backup listed [because it will be empty] ,
 and automaticcaly switch to a full one. I think after a crash it's a good
 practice to do a full backup, to see if there are some problems on files or
 on filesystems, but if I am wrong I am happy to know :) .

 After a crash I do not do a backup, I do a verification of the data
 (VACUUM and some data consistency checks usually), lest you have a
 useless backup. The backup goes after that.

 But, I'm not DBA guru.

 Remember that I propose a map in ram to reduce the impact on performances,
 but we could create an option to leave the choose to the user, if you want a
 crash safe map, at every flush will be updated also a map file , if not, the
 map will be in ram.

 I think the performance impact of a WAL-linked map isn't so big as to
 prefer the possibility of broken backups. I wouldn't even allow it.

 It's not free, making it crash safe, but it's not that expensive
 either. If 

Re: [HACKERS] Proposal: Incremental Backup

2014-08-04 Thread Claudio Freire
On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

Thing is, I don't see how the LSN method is that much harder than an
on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

Keeping a last-updated-LSN for each segment (or group of blocks) is
just as easy as keeping a bitmap, and far more flexible and robust.

The complexity and cost of safely keeping the map up-to-date is what's
in question here, but as was pointed before, there's no really safe
alternative. Nor modification times nor checksums (nor in-memory
bitmaps IMV) are really safe enough for backups, so you really want to
use something like the LSN. It's extra work, but opens up a world of
possibilities.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-01 Thread Claudio Freire
On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 c) the map is not crash safe by design, because it needs only for
 incremental backup to track what blocks needs to be backuped, not for
 consistency or recovery of the whole cluster, so it's not an heavy cost for
 the whole cluster to maintain it. we could think an option (but it's heavy)
 to write it at every flush  on file to have crash-safe map, but I not think
 it's so usefull . I think it's acceptable, and probably it's better to force
 that, to say: if your db will crash, you need a fullbackup ,

 I am not sure if your this assumption is right/acceptable, how can
 we say that in such a case users will be okay to have a fullbackup?
 In general, taking fullbackup is very heavy operation and we should
 try to avoid such a situation.


Besides, the one taking the backup (ie: script) may not be aware of
the need to take a full one.

It's a bad design to allow broken backups at all, IMNSHO.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-01 Thread desmodemone
2014-08-01 18:20 GMT+02:00 Claudio Freire klaussfre...@gmail.com:

 On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  c) the map is not crash safe by design, because it needs only for
  incremental backup to track what blocks needs to be backuped, not for
  consistency or recovery of the whole cluster, so it's not an heavy cost
 for
  the whole cluster to maintain it. we could think an option (but it's
 heavy)
  to write it at every flush  on file to have crash-safe map, but I not
 think
  it's so usefull . I think it's acceptable, and probably it's better to
 force
  that, to say: if your db will crash, you need a fullbackup ,
 
  I am not sure if your this assumption is right/acceptable, how can
  we say that in such a case users will be okay to have a fullbackup?
  In general, taking fullbackup is very heavy operation and we should
  try to avoid such a situation.


 Besides, the one taking the backup (ie: script) may not be aware of
 the need to take a full one.

 It's a bad design to allow broken backups at all, IMNSHO.


Hi Claudio,
 thanks for your observation
First: the case it's after a crash of a database, and it's not something
happens every day or every week. It's something that happens in rare
conditions, or almost my experience is so. If it happens very often
probably there are other problems.
Second: to avoid the problem to know if the db needed to have a full backup
to rebuild the map we could think to write in the map header the backup
reference (with an id and LSN reference for example ) so  if the
someone/something try to do an incremental backup after a crash, the map
header will not have noone full backup listed [because it will be empty] ,
and automaticcaly switch to a full one. I think after a crash it's a good
practice to do a full backup, to see if there are some problems on files or
on filesystems, but if I am wrong I am happy to know :) .

Remember that  I propose a map in ram to reduce the impact on performances,
but we could create an option to leave the choose to the user, if you want
a crash safe map, at every flush will be updated also a map file , if not,
the map will be in ram.

Kind Regards


Mat


Re: [HACKERS] Proposal: Incremental Backup

2014-08-01 Thread Claudio Freire
On Fri, Aug 1, 2014 at 1:43 PM, desmodemone desmodem...@gmail.com wrote:



 2014-08-01 18:20 GMT+02:00 Claudio Freire klaussfre...@gmail.com:

 On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  c) the map is not crash safe by design, because it needs only for
  incremental backup to track what blocks needs to be backuped, not for
  consistency or recovery of the whole cluster, so it's not an heavy cost
  for
  the whole cluster to maintain it. we could think an option (but it's
  heavy)
  to write it at every flush  on file to have crash-safe map, but I not
  think
  it's so usefull . I think it's acceptable, and probably it's better to
  force
  that, to say: if your db will crash, you need a fullbackup ,
 
  I am not sure if your this assumption is right/acceptable, how can
  we say that in such a case users will be okay to have a fullbackup?
  In general, taking fullbackup is very heavy operation and we should
  try to avoid such a situation.


 Besides, the one taking the backup (ie: script) may not be aware of
 the need to take a full one.

 It's a bad design to allow broken backups at all, IMNSHO.


 Hi Claudio,
  thanks for your observation
 First: the case it's after a crash of a database, and it's not something
 happens every day or every week. It's something that happens in rare
 conditions, or almost my experience is so. If it happens very often probably
 there are other problems.

Not so much. In this case, the software design isn't software-crash
safe, it's not that it's not hardware-crash safe.

What I mean, is that an in-memory bitmap will also be out of sync if
you kill -9 (or if one of the backends is killed by the OOM), or if it
runs out of disk space too.

Normally, a simple restart fixes it because pg will do crash recovery
just fine, but now the bitmap is out of sync, and further backups are
broken. It's not a situation I want to face unless there's a huge
reason to go for such design.

If you make it so that the commit includes flipping the bitmap, it can
be done cleverly enough to avoid too much overhead (though it will
have some), and you now have it so that any to-be-touched block is now
part of the backup. You just apply all the bitmap changes in batch
after a checkpoint, before syncing to disk, and before erasing the WAL
segments. Simple, relatively efficient, and far more robust than an
in-memory thing.

Still, it *can* double checkpoint I/O on the worst case, and it's not
an unfathomable case either.

 Second: to avoid the problem to know if the db needed to have a full backup
 to rebuild the map we could think to write in the map header the backup
 reference (with an id and LSN reference for example ) so  if the
 someone/something try to do an incremental backup after a crash, the map
 header will not have noone full backup listed [because it will be empty] ,
 and automaticcaly switch to a full one. I think after a crash it's a good
 practice to do a full backup, to see if there are some problems on files or
 on filesystems, but if I am wrong I am happy to know :) .

After a crash I do not do a backup, I do a verification of the data
(VACUUM and some data consistency checks usually), lest you have a
useless backup. The backup goes after that.

But, I'm not DBA guru.

 Remember that I propose a map in ram to reduce the impact on performances,
 but we could create an option to leave the choose to the user, if you want a
 crash safe map, at every flush will be updated also a map file , if not, the
 map will be in ram.

I think the performance impact of a WAL-linked map isn't so big as to
prefer the possibility of broken backups. I wouldn't even allow it.

It's not free, making it crash safe, but it's not that expensive
either. If you want to support incremental backups, you really really
need to make sure those backups are correct and usable, and IMV
anything short of full crash safety will be too fragile for that
purpose. I don't want to be in a position of needing the backup and
finding out it's inconsistent after the fact, and I don't want to
encourage people to set themselves up for that by adding that faster
but unsafe backups flag.

I'd do it either safe, or not at all.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Amit Kapila
On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas robertmh...@gmail.com wrote:

 IMV, the way to eventually make this efficient is to have a background
 process that reads the WAL and figures out which data blocks have been
 modified, and tracks that someplace.

Nice idea, however I think to make this happen we need to ensure
that WAL doesn't get deleted/overwritten before this process reads
it (may be by using some existing param or mechanism) and
wal_level has to be archive or more.

One more thing, what will happen for unlogged tables with such a
mechanism?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Amit Kapila
On Wed, Jul 30, 2014 at 7:00 PM, desmodemone desmodem...@gmail.com wrote:
 Hello,
 I think it's very useful an incremental/differential backup
method, by the way
 the method has two drawbacks:
 1)  In a database normally, even if the percent of modify rows is small
compared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are random. If some tables are static, probably they are lookup
tables or something like a registry, and  normally these  tables are small .
 2)  every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.

 In my opinion to solve these problems we need a different implementation
of incremental backup.
 I will try to show my idea about it.

 I think we need a bitmap map in memory to track the changed chunks of
the file/s/table [ for chunk I mean an X number of tracked pages , to
divide the every  tracked files in chunks ], so we could send only the
changed blocks  from last incremental backup ( that could be a full for
incremental backup ).The map could have one submaps for every tracked
files, so it's more simple.

 So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk
of 8 block is only an example]  , If  we use one map of 1Mbit ( 1Mbit are
 125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map  does not
need too much memory if we track groups of blocks i.e. chunk, obviously
the problem is more complex, and probably there are better and more robust
solutions.
 Probably we need  more space for the header of map to track the
informations about file and the last backup and so on.

 I think the map must be updated by the bgwriter , i.e. when it flushes
the dirty buffers,

Not only bgwriter, but checkpointer and backends as well, as
those also flush buffers.  Also there are some writes which are
done outside shared buffers, you need to track those separately.

Another point is that to track the changes due to hint bit modification,
you need to enable checksums or wal_log_hints which will either
lead to more cpu or I/O.

 fortunately  we don't  need this map for consistence of database, so we
could create and manage it in memory to limit the impact on performance.
 The drawback is that If the db crashes or someone closes it , the next
incremental backup will be full , we could think to flush the map to disk
if the PostgreSQL will receive a signal of closing process or something
similar.



 In this way we obtain :
 1) we read only small part of a database ( the probability of a changed
chunk are less the the changed of the whole file )
 2) we do not need to calculate the checksum, saving cpu
 3) we save i/o in reading and writing ( we will send only the changed
block from last incremental backup )
 4) we save network
 5) we save time during backup. if we read and write less data, we reduce
the time to do an incremental backup.
 6) I think the bitmap map in memory will not impact too much on the
performance of the bgwriter.

 What do you think about?

I think with this method has 3 drawbacks compare to method
proposed
a.  either enable checksum or wal_log_hints, so it will incur extra
 I/O if you enable wal_log_hints
b.  backends also need to update the map which though a small
 cost, but still ...
c.  map is not crash safe, due to which sometimes full back up
 is needed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Michael Paquier
On Thu, Jul 31, 2014 at 3:00 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 One more thing, what will happen for unlogged tables with such a
 mechanism?
I imagine that you can safely bypass them as they are not accessible
during recovery and will start with empty relation files once recovery
ends. The same applies to temporary relations. Also this bgworker will
need access to the catalogs to look at the relation relkind.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread desmodemone
2014-07-31 8:26 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:

 On Wed, Jul 30, 2014 at 7:00 PM, desmodemone desmodem...@gmail.com
 wrote:
  Hello,
  I think it's very useful an incremental/differential backup
 method, by the way
  the method has two drawbacks:
  1)  In a database normally, even if the percent of modify rows is small
 compared to total rows, the probability to change only some files /tables
 is small, because the rows are normally not ordered inside a tables and the
 update are random. If some tables are static, probably they are lookup
 tables or something like a registry, and  normally these  tables are small .
  2)  every time a file changed require every time to read all file. So if
 the point A is true, probably you are reading a large part of the databases
 and then send that part , instead of sending a small part.
 
  In my opinion to solve these problems we need a different implementation
 of incremental backup.
  I will try to show my idea about it.
 
  I think we need a bitmap map in memory to track the changed chunks of
 the file/s/table [ for chunk I mean an X number of tracked pages , to
 divide the every  tracked files in chunks ], so we could send only the
 changed blocks  from last incremental backup ( that could be a full for
 incremental backup ).The map could have one submaps for every tracked
 files, so it's more simple.
 
  So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk
 of 8 block is only an example]  , If  we use one map of 1Mbit ( 1Mbit are
  125KB of memory ) we could track a table with a total size of 64Gb,
 probably we could use a compression algorithm because the map is done by
 1 and 0 . This is a very simple idea, but it shows that the map  does not
 need too much memory if we track groups of blocks i.e. chunk, obviously
 the problem is more complex, and probably there are better and more robust
 solutions.
  Probably we need  more space for the header of map to track the
 informations about file and the last backup and so on.
 
  I think the map must be updated by the bgwriter , i.e. when it flushes
 the dirty buffers,

 Not only bgwriter, but checkpointer and backends as well, as
 those also flush buffers.  Also there are some writes which are
 done outside shared buffers, you need to track those separately.

 Another point is that to track the changes due to hint bit modification,
 you need to enable checksums or wal_log_hints which will either
 lead to more cpu or I/O.

  fortunately  we don't  need this map for consistence of database, so we
 could create and manage it in memory to limit the impact on performance.
  The drawback is that If the db crashes or someone closes it , the next
 incremental backup will be full , we could think to flush the map to disk
 if the PostgreSQL will receive a signal of closing process or something
 similar.
 
 
 
  In this way we obtain :
  1) we read only small part of a database ( the probability of a changed
 chunk are less the the changed of the whole file )
  2) we do not need to calculate the checksum, saving cpu
  3) we save i/o in reading and writing ( we will send only the changed
 block from last incremental backup )
  4) we save network
  5) we save time during backup. if we read and write less data, we reduce
 the time to do an incremental backup.
  6) I think the bitmap map in memory will not impact too much on the
 performance of the bgwriter.
 
  What do you think about?

 I think with this method has 3 drawbacks compare to method
 proposed
 a.  either enable checksum or wal_log_hints, so it will incur extra
  I/O if you enable wal_log_hints
 b.  backends also need to update the map which though a small
  cost, but still ...
 c.  map is not crash safe, due to which sometimes full back up
  is needed.

 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com



Hi Amit, thank you for your comments .
However , about drawbacks:
a) It's not clear to me why the method needs checksum enable, I mean, if
the bgwriter or another process flushes a dirty buffer, it's only have to
signal in the map that the blocks are changed with an update of the value
from 0 to 1.They not need to verify the checksum of the block, we could
assume that when a dirty buffers is flushed, the block is changed [ or
better in my idea, the chunk of N blocks ].
We could think an advanced setting that verify the checksum, but I think
will be heavier.
b) yes the backends need to update the map, but it's in memory, and as I
show, could be very small if we you chunk of blocks.If we not compress the
map, I not think could be a bottleneck.
c) the map is not crash safe by design, because it needs only for
incremental backup to track what blocks needs to be backuped, not for
consistency or recovery of the whole cluster, so it's not an heavy cost for
the whole cluster to maintain it. we could think an option (but it's heavy)
to write it at every flush  on file to have crash-safe map, but 

Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Bruce Momjian
On Thu, Jul 31, 2014 at 11:30:52AM +0530, Amit Kapila wrote:
 On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas robertmh...@gmail.com wrote:
 
  IMV, the way to eventually make this efficient is to have a background
  process that reads the WAL and figures out which data blocks have been
  modified, and tracks that someplace.
 
 Nice idea, however I think to make this happen we need to ensure
 that WAL doesn't get deleted/overwritten before this process reads
 it (may be by using some existing param or mechanism) and 
 wal_level has to be archive or more.

Well, you probably are going to have all the WAL files available because
you have not taken an incremental backup yet, and therefore you would
have no PITR backup at all.  Once the incremental backup is done, you
can delete the old WAL files if you don't need fine-grained restore
points.

Robert also suggested reading the block numbers from the WAL as they are
created and not needing them at incremental backup time.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Robert Haas
On Thu, Jul 31, 2014 at 2:00 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas robertmh...@gmail.com wrote:
 IMV, the way to eventually make this efficient is to have a background
 process that reads the WAL and figures out which data blocks have been
 modified, and tracks that someplace.

 Nice idea, however I think to make this happen we need to ensure
 that WAL doesn't get deleted/overwritten before this process reads
 it (may be by using some existing param or mechanism) and
 wal_level has to be archive or more.

That should be a problem; logical decoding added a mechanism for
retaining WAL until decoding is done with it, and if it needs to be
extended a bit further, so be it.

 One more thing, what will happen for unlogged tables with such a
 mechanism?

As Michael Paquier points out, it doesn't matter, because that data
will be gone anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Claudio Freire
On Thu, Jul 31, 2014 at 5:26 AM, desmodemone desmodem...@gmail.com wrote:
 b) yes the backends need to update the map, but it's in memory, and as I
 show, could be very small if we you chunk of blocks.If we not compress the
 map, I not think could be a bottleneck.

If it's in memory, it's not crash-safe. For something aimed at
backups, I think crash safety is a requirement. So it's at least one
extra I/O per commit, maybe less if many can be coalesced at
checkpoints, but I wouldn't count on it too much, because worst cases
are easy to come by (sparse enough updates).

I think this could be pegged on WAL replay / checkpoint stuff alone,
so it would be very asynchronous, but not free.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-31 Thread Amit Kapila
On Thu, Jul 31, 2014 at 1:56 PM, desmodemone desmodem...@gmail.com wrote:

 Hi Amit, thank you for your comments .
 However , about drawbacks:
 a) It's not clear to me why the method needs checksum enable, I mean, if
the bgwriter or another process flushes a dirty buffer, it's only have to
signal in the map that the blocks are changed with an update of the value
from 0 to 1.They not need to verify the checksum of the block, we could
assume that when a dirty buffers is flushed, the block is changed [ or
better in my idea, the chunk of N blocks ].
 We could think an advanced setting that verify the checksum, but I think
will be heavier.

I was thinking of enabling it for hint bit updates, if any operation
changes the page due to hint bit, then it will not mark the buffer
dirty unless wal_log_hints or checksum is enabled.  Now I think
if we don't want to track page changes due to hint bit updates, then
this will not be required.


 b) yes the backends need to update the map, but it's in memory, and as I
show, could be very small if we you chunk of blocks.If we not compress the
map, I not think could be a bottleneck.

This map has to reside in shared memory, so how will you
estimate the size of this map during startup and even if you
have some way to do that, I think still you need to detail out
the idea how your chunk scheme will work incase multiple
backends are trying to flush pages which are part of same chunk.

Also as I mentioned previously there are some operations which
are done without use of shared buffers, so you need to think
how to track the changes done by those operations.

 c) the map is not crash safe by design, because it needs only for
incremental backup to track what blocks needs to be backuped, not for
consistency or recovery of the whole cluster, so it's not an heavy cost for
the whole cluster to maintain it. we could think an option (but it's heavy)
to write it at every flush  on file to have crash-safe map, but I not think
it's so usefull . I think it's acceptable, and probably it's better to
force that, to say: if your db will crash, you need a fullbackup ,

I am not sure if your this assumption is right/acceptable, how can
we say that in such a case users will be okay to have a fullbackup?
In general, taking fullbackup is very heavy operation and we should
try to avoid such a situation.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal: Incremental Backup

2014-07-30 Thread desmodemone
2014-07-29 18:35 GMT+02:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it
:

 Il 25/07/14 20:44, Robert Haas ha scritto:
  On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire klaussfre...@gmail.com
 wrote:
  On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
  marco.nenciar...@2ndquadrant.it wrote:
  1. Proposal
  =
  Our proposal is to introduce the concept of a backup profile. The
 backup
  profile consists of a file with one line per file detailing tablespace,
  path, modification time, size and checksum.
  Using that file the BASE_BACKUP command can decide which file needs to
  be sent again and which is not changed. The algorithm should be very
  similar to rsync, but since our files are never bigger than 1 GB per
  file that is probably granular enough not to worry about copying parts
  of files, just whole files.
 
  That wouldn't nearly as useful as the LSN-based approach mentioned
 before.
 
  I've had my share of rsyncing live databases (when resizing
  filesystems, not for backup, but the anecdotal evidence applies
  anyhow) and with moderately write-heavy databases, even if you only
  modify a tiny portion of the records, you end up modifying a huge
  portion of the segments, because the free space choice is random.
 
  There have been patches going around to change the random nature of
  that choice, but none are very likely to make a huge difference for
  this application. In essence, file-level comparisons get you only a
  mild speed-up, and are not worth the effort.
 
  I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
  the I/O of inspecting the LSN of entire segments (necessary
  optimization for huge multi-TB databases) and backups only the
  portions modified when segments do contain changes, so it's the best
  of both worlds. Any partial implementation would either require lots
  of I/O (LSN only) or save very little (file only) unless it's an
  almost read-only database.
 
  I agree with much of that.  However, I'd question whether we can
  really seriously expect to rely on file modification times for
  critical data-integrity operations.  I wouldn't like it if somebody
  ran ntpdate to fix the time while the base backup was running, and it
  set the time backward, and the next differential backup consequently
  omitted some blocks that had been modified during the base backup.
 

 Our proposal doesn't rely on file modification times for data integrity.

 We are using the file mtime only as a fast indication that the file has
 changed, and transfer it again without performing the checksum.
 If timestamp and size match we rely on *checksums* to decide if it has
 to be sent.

 In SMART MODE we would use the file mtime to skip the checksum check
 in some cases, but it wouldn't be the default operation mode and it will
 have all the necessary warnings attached. However the SMART MODE isn't
 a core part of our proposal, and can be delayed until we agree on the
 safest way to bring it to the end user.

 Regards,
 Marco

 --
 Marco Nenciarini - 2ndQuadrant Italy
 PostgreSQL Training, Services and Support
 marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



Hello,
I think it's very useful an incremental/differential backup
method, by the way
the method has two drawbacks:
1)  In a database normally, even if the percent of modify rows is small
compared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are random. If some tables are static, probably they are lookup
tables or something like a registry, and  normally these  tables are small .
2)  every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.

In my opinion to solve these problems we need a different implementation of
incremental backup.
I will try to show my idea about it.

I think we need a bitmap map in memory to track the changed chunks of the
file/s/table [ for chunk I mean an X number of tracked pages , to divide
the every  tracked files in chunks ], so we could send only the changed
blocks  from last incremental backup ( that could be a full for incremental
backup ).The map could have one submaps for every tracked files, so it's
more simple.

So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of
8 block is only an example]  , If  we use one map of 1Mbit ( 1Mbit are
125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map  does not
need too much memory if we track groups of blocks i.e. chunk, obviously
the problem is more complex, and probably there are better and more robust
solutions.
Probably we need  more space for the header of map to track the

Re: [HACKERS] Proposal: Incremental Backup

2014-07-30 Thread Robert Haas
On Tue, Jul 29, 2014 at 12:35 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 I agree with much of that.  However, I'd question whether we can
 really seriously expect to rely on file modification times for
 critical data-integrity operations.  I wouldn't like it if somebody
 ran ntpdate to fix the time while the base backup was running, and it
 set the time backward, and the next differential backup consequently
 omitted some blocks that had been modified during the base backup.

 Our proposal doesn't rely on file modification times for data integrity.

Good.

 We are using the file mtime only as a fast indication that the file has
 changed, and transfer it again without performing the checksum.
 If timestamp and size match we rely on *checksums* to decide if it has
 to be sent.

So an incremental backup reads every block in the database and
transfers only those that have changed?  (BTW, I'm just asking.
That's OK with me for a first version; we can make improve it, shall
we say, incrementally.)

Why checksums (which have an arbitrarily-small chance of indicating a
match that doesn't really exist) rather than LSNs (which have no
chance of making that mistake)?

 In SMART MODE we would use the file mtime to skip the checksum check
 in some cases, but it wouldn't be the default operation mode and it will
 have all the necessary warnings attached. However the SMART MODE isn't
 a core part of our proposal, and can be delayed until we agree on the
 safest way to bring it to the end user.

That's not a mode I'd feel comfortable calling smart.  More like
roulette mode.

IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace.  Then we can send a precisely
accurate backup without relying on either modification times or
reading the full database.  If Heikki's patch to standardize the way
this kind of information is represented in WAL gets committed, this
should get a lot easier to implement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-29 Thread Marco Nenciarini
Il 25/07/14 16:15, Michael Paquier ha scritto:
 On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 0. Introduction:
 =
 This is a proposal for adding incremental backup support to streaming
 protocol and hence to pg_basebackup command.
 Not sure that incremental is a right word as the existing backup
 methods using WAL archives are already like that. I recall others
 calling that differential backup from some previous threads. Would
 that sound better?
 

differential backup is widely used to refer to a backup that is always
based on a full backup. An incremental backup can be based either on
a full backup or on a previous incremental backup. We picked that
name to emphasize this property.

 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile.
 Sounds good. Thanks for looking at that.
 
 The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.
 There are actually two levels of differential backups: file-level,
 which is the approach you are taking, and block level. Block level
 backup makes necessary a scan of all the blocks of all the relations
 and take only the data from the blocks newer than the LSN given by the
 BASE_BACKUP command. In the case of file-level approach, you could
 already backup the relation file after finding at least one block
 already modified.

I like the idea of shortcutting the checksum when you find a block with
a LSN newer than the previous backup START WAL LOCATION, however I see
it as a further optimization. In any case, it is worth storing the
backup start LSN in the header section of the backup_profile together
with other useful information about the backup starting position.

As a first step we would have a simple and robust method to produce a
file-level incremental backup.

 Btw, the size of relation files depends on the size
 defined by --with-segsize when running configure. 1GB is the default
 though, and the value usually used. Differential backups can reduce
 the size of overall backups depending on the application, at the cost
 of some CPU to analyze the relation blocks that need to be included in
 the backup.

We tested the idea on several multi-terabyte installations using a
custom deduplication script which follows this approach. The result is
that it can reduce the backup size of more than 50%. Also most of
databases in the range 50GB - 1TB can take a big advantage of it.

 
 It could also be used in 'refresh' mode, by allowing the pg_basebackup
 command to 'refresh' an old backup directory with a new backup.
 I am not sure this is really helpful...

Could you please elaborate the last sentence?

 
 The final piece of this architecture is a new program called
 pg_restorebackup which is able to operate on a chain of incremental
 backups, allowing the user to build an usable PGDATA from them or
 executing maintenance operations like verify the checksums or estimate
 the final size of recovered PGDATA.
 Yes, right. Taking a differential backup is not difficult, but
 rebuilding a constant base backup with a full based backup and a set
 of differential ones is the tricky part, but you need to be sure that
 all the pieces of the puzzle are here.

If we limit it to be file-based, the recover procedure is conceptually
simple. Read every involved manifest from the start and take the latest
available version of any file (or mark it for deletion, if the last time
it is named is in a backup_exceptions file). Keeping the algorithm as
simple as possible is in our opinion the best way to go.

 
 We created a wiki page with all implementation details at
 https://wiki.postgresql.org/wiki/Incremental_backup
 I had a look at that, and I think that you are missing the shot in the
 way differential backups should be taken. What would be necessary is
 to pass a WAL position (or LSN, logical sequence number like
 0/260) with a new clause called DIFFERENTIAL (INCREMENTAL in your
 first proposal) in the BASE BACKUP command, and then have the server
 report back to client all the files that contain blocks newer than the
 given LSN position given for file-level backup, or the blocks newer
 than the given LSN for the block-level differential backup.

In our proposal a file is skipped only, and only if it has the same
size, the same mtime and *the same checksum* of the original file. We
intentionally want to keep it simple, easily supporting also files that
are stored in $PGDATA but don't follow any format known by Postgres.
However, even with more complex algorithms, all the 

Re: [HACKERS] Proposal: Incremental Backup

2014-07-29 Thread Marco Nenciarini
Il 25/07/14 20:21, Claudio Freire ha scritto:
 On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile. The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.
 
 That wouldn't nearly as useful as the LSN-based approach mentioned before.
 
 I've had my share of rsyncing live databases (when resizing
 filesystems, not for backup, but the anecdotal evidence applies
 anyhow) and with moderately write-heavy databases, even if you only
 modify a tiny portion of the records, you end up modifying a huge
 portion of the segments, because the free space choice is random.
 
 There have been patches going around to change the random nature of
 that choice, but none are very likely to make a huge difference for
 this application. In essence, file-level comparisons get you only a
 mild speed-up, and are not worth the effort.
 
 I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
 the I/O of inspecting the LSN of entire segments (necessary
 optimization for huge multi-TB databases) and backups only the
 portions modified when segments do contain changes, so it's the best
 of both worlds. Any partial implementation would either require lots
 of I/O (LSN only) or save very little (file only) unless it's an
 almost read-only database.
 

From my experience, if a database is big enough and there is any kind of
historical data in the database, the file only approach works well.
Moreover it has the advantage of being simple and easily verifiable.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-07-29 Thread Claudio Freire
On Tue, Jul 29, 2014 at 1:24 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile. The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.

 That wouldn't nearly as useful as the LSN-based approach mentioned before.

 I've had my share of rsyncing live databases (when resizing
 filesystems, not for backup, but the anecdotal evidence applies
 anyhow) and with moderately write-heavy databases, even if you only
 modify a tiny portion of the records, you end up modifying a huge
 portion of the segments, because the free space choice is random.

 There have been patches going around to change the random nature of
 that choice, but none are very likely to make a huge difference for
 this application. In essence, file-level comparisons get you only a
 mild speed-up, and are not worth the effort.

 I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
 the I/O of inspecting the LSN of entire segments (necessary
 optimization for huge multi-TB databases) and backups only the
 portions modified when segments do contain changes, so it's the best
 of both worlds. Any partial implementation would either require lots
 of I/O (LSN only) or save very little (file only) unless it's an
 almost read-only database.


 From my experience, if a database is big enough and there is any kind of
 historical data in the database, the file only approach works well.
 Moreover it has the advantage of being simple and easily verifiable.

I don't see how that would be true if it's not full of read-only or
append-only tables.

Furthermore, even in that case, you need to have the database locked
while performing the file-level backup, and computing all the
checksums means processing the whole thing. That's a huge amount of
time to be locked for multi-TB databases, so how is that good enough?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-29 Thread Marco Nenciarini
Il 25/07/14 20:44, Robert Haas ha scritto:
 On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile. The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.

 That wouldn't nearly as useful as the LSN-based approach mentioned before.

 I've had my share of rsyncing live databases (when resizing
 filesystems, not for backup, but the anecdotal evidence applies
 anyhow) and with moderately write-heavy databases, even if you only
 modify a tiny portion of the records, you end up modifying a huge
 portion of the segments, because the free space choice is random.

 There have been patches going around to change the random nature of
 that choice, but none are very likely to make a huge difference for
 this application. In essence, file-level comparisons get you only a
 mild speed-up, and are not worth the effort.

 I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
 the I/O of inspecting the LSN of entire segments (necessary
 optimization for huge multi-TB databases) and backups only the
 portions modified when segments do contain changes, so it's the best
 of both worlds. Any partial implementation would either require lots
 of I/O (LSN only) or save very little (file only) unless it's an
 almost read-only database.
 
 I agree with much of that.  However, I'd question whether we can
 really seriously expect to rely on file modification times for
 critical data-integrity operations.  I wouldn't like it if somebody
 ran ntpdate to fix the time while the base backup was running, and it
 set the time backward, and the next differential backup consequently
 omitted some blocks that had been modified during the base backup.
 

Our proposal doesn't rely on file modification times for data integrity.

We are using the file mtime only as a fast indication that the file has
changed, and transfer it again without performing the checksum.
If timestamp and size match we rely on *checksums* to decide if it has
to be sent.

In SMART MODE we would use the file mtime to skip the checksum check
in some cases, but it wouldn't be the default operation mode and it will
have all the necessary warnings attached. However the SMART MODE isn't
a core part of our proposal, and can be delayed until we agree on the
safest way to bring it to the end user.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Incremental Backup

2014-07-29 Thread Michael Paquier
On Wed, Jul 30, 2014 at 1:11 AM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 differential backup is widely used to refer to a backup that is always
 based on a full backup. An incremental backup can be based either on
 a full backup or on a previous incremental backup. We picked that
 name to emphasize this property.

You can refer to this email:
http://www.postgresql.org/message-id/cabuevexz-2nh6jxb5sjs_dss7qbmof0noypeeyaybkbufkp...@mail.gmail.com

 As a first step we would have a simple and robust method to produce a
 file-level incremental backup.
An approach using Postgres internals, which we are sure we can rely
on, is more robust. A LSN is similar to a timestamp in pg internals as
it refers to the point in time where a block was lastly modified.

 It could also be used in 'refresh' mode, by allowing the pg_basebackup
 command to 'refresh' an old backup directory with a new backup.
 I am not sure this is really helpful...

 Could you please elaborate the last sentence?
This overlaps with the features you are proposing with
pg_restorebackup, where a backup is rebuilt. Why implementing two
interfaces for the same things?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-25 Thread Michael Paquier
On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 0. Introduction:
 =
 This is a proposal for adding incremental backup support to streaming
 protocol and hence to pg_basebackup command.
Not sure that incremental is a right word as the existing backup
methods using WAL archives are already like that. I recall others
calling that differential backup from some previous threads. Would
that sound better?

 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile.
Sounds good. Thanks for looking at that.

 The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.
There are actually two levels of differential backups: file-level,
which is the approach you are taking, and block level. Block level
backup makes necessary a scan of all the blocks of all the relations
and take only the data from the blocks newer than the LSN given by the
BASE_BACKUP command. In the case of file-level approach, you could
already backup the relation file after finding at least one block
already modified. Btw, the size of relation files depends on the size
defined by --with-segsize when running configure. 1GB is the default
though, and the value usually used. Differential backups can reduce
the size of overall backups depending on the application, at the cost
of some CPU to analyze the relation blocks that need to be included in
the backup.

 It could also be used in 'refresh' mode, by allowing the pg_basebackup
 command to 'refresh' an old backup directory with a new backup.
I am not sure this is really helpful...

 The final piece of this architecture is a new program called
 pg_restorebackup which is able to operate on a chain of incremental
 backups, allowing the user to build an usable PGDATA from them or
 executing maintenance operations like verify the checksums or estimate
 the final size of recovered PGDATA.
Yes, right. Taking a differential backup is not difficult, but
rebuilding a constant base backup with a full based backup and a set
of differential ones is the tricky part, but you need to be sure that
all the pieces of the puzzle are here.

 We created a wiki page with all implementation details at
 https://wiki.postgresql.org/wiki/Incremental_backup
I had a look at that, and I think that you are missing the shot in the
way differential backups should be taken. What would be necessary is
to pass a WAL position (or LSN, logical sequence number like
0/260) with a new clause called DIFFERENTIAL (INCREMENTAL in your
first proposal) in the BASE BACKUP command, and then have the server
report back to client all the files that contain blocks newer than the
given LSN position given for file-level backup, or the blocks newer
than the given LSN for the block-level differential backup.
Note that we would need a way to identify the type of the backup taken
in backup_label, with the LSN position sent with DIFFERENTIAL clause
of BASE_BACKUP, by adding a new field in it.

When taking a differential backup, the LSN position necessary would be
simply the value of START WAL LOCATION of the last differential or
full backup taken. This results as well in a new option for
pg_basebackup of the type --differential='0/260' to take directly
a differential backup.

Then, for the utility pg_restorebackup, what you would need to do is
simply to pass a list of backups to it, then validate if they can
build a consistent backup, and build it.

Btw, the file-based method would be simpler to implement, especially
for rebuilding the backups.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-25 Thread Claudio Freire
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile. The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.

That wouldn't nearly as useful as the LSN-based approach mentioned before.

I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.

There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.

I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-25 Thread Robert Haas
On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile. The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.

 That wouldn't nearly as useful as the LSN-based approach mentioned before.

 I've had my share of rsyncing live databases (when resizing
 filesystems, not for backup, but the anecdotal evidence applies
 anyhow) and with moderately write-heavy databases, even if you only
 modify a tiny portion of the records, you end up modifying a huge
 portion of the segments, because the free space choice is random.

 There have been patches going around to change the random nature of
 that choice, but none are very likely to make a huge difference for
 this application. In essence, file-level comparisons get you only a
 mild speed-up, and are not worth the effort.

 I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
 the I/O of inspecting the LSN of entire segments (necessary
 optimization for huge multi-TB databases) and backups only the
 portions modified when segments do contain changes, so it's the best
 of both worlds. Any partial implementation would either require lots
 of I/O (LSN only) or save very little (file only) unless it's an
 almost read-only database.

I agree with much of that.  However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations.  I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-25 Thread Claudio Freire
On Fri, Jul 25, 2014 at 3:44 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 1. Proposal
 =
 Our proposal is to introduce the concept of a backup profile. The backup
 profile consists of a file with one line per file detailing tablespace,
 path, modification time, size and checksum.
 Using that file the BASE_BACKUP command can decide which file needs to
 be sent again and which is not changed. The algorithm should be very
 similar to rsync, but since our files are never bigger than 1 GB per
 file that is probably granular enough not to worry about copying parts
 of files, just whole files.

 That wouldn't nearly as useful as the LSN-based approach mentioned before.

 I've had my share of rsyncing live databases (when resizing
 filesystems, not for backup, but the anecdotal evidence applies
 anyhow) and with moderately write-heavy databases, even if you only
 modify a tiny portion of the records, you end up modifying a huge
 portion of the segments, because the free space choice is random.

 There have been patches going around to change the random nature of
 that choice, but none are very likely to make a huge difference for
 this application. In essence, file-level comparisons get you only a
 mild speed-up, and are not worth the effort.

 I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
 the I/O of inspecting the LSN of entire segments (necessary
 optimization for huge multi-TB databases) and backups only the
 portions modified when segments do contain changes, so it's the best
 of both worlds. Any partial implementation would either require lots
 of I/O (LSN only) or save very little (file only) unless it's an
 almost read-only database.

 I agree with much of that.  However, I'd question whether we can
 really seriously expect to rely on file modification times for
 critical data-integrity operations.  I wouldn't like it if somebody
 ran ntpdate to fix the time while the base backup was running, and it
 set the time backward, and the next differential backup consequently
 omitted some blocks that had been modified during the base backup.

I was thinking the same. But that timestamp could be saved on the file
itself, or some other catalog, like a trusted metadata implemented
by pg itself, and it could be an LSN range instead of a timestamp
really.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-25 Thread Josh Berkus
On 07/25/2014 11:49 AM, Claudio Freire wrote:
 I agree with much of that.  However, I'd question whether we can
  really seriously expect to rely on file modification times for
  critical data-integrity operations.  I wouldn't like it if somebody
  ran ntpdate to fix the time while the base backup was running, and it
  set the time backward, and the next differential backup consequently
  omitted some blocks that had been modified during the base backup.
 I was thinking the same. But that timestamp could be saved on the file
 itself, or some other catalog, like a trusted metadata implemented
 by pg itself, and it could be an LSN range instead of a timestamp
 really.

What about requiring checksums to be on instead, and checking the
file-level checksums?   Hmmm, wait, do we have file-level checksums?  Or
just page-level?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-07-25 Thread Claudio Freire
On Fri, Jul 25, 2014 at 7:38 PM, Josh Berkus j...@agliodbs.com wrote:
 On 07/25/2014 11:49 AM, Claudio Freire wrote:
 I agree with much of that.  However, I'd question whether we can
  really seriously expect to rely on file modification times for
  critical data-integrity operations.  I wouldn't like it if somebody
  ran ntpdate to fix the time while the base backup was running, and it
  set the time backward, and the next differential backup consequently
  omitted some blocks that had been modified during the base backup.
 I was thinking the same. But that timestamp could be saved on the file
 itself, or some other catalog, like a trusted metadata implemented
 by pg itself, and it could be an LSN range instead of a timestamp
 really.

 What about requiring checksums to be on instead, and checking the
 file-level checksums?   Hmmm, wait, do we have file-level checksums?  Or
 just page-level?

It would be very computationally expensive to have up-to-date
file-level checksums, so I highly doubt it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers